lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <>
Subject Re: Fastest way to fetch N documents with unique keys within large numbers of indexes..
Date Tue, 07 Jun 2005 05:17:04 GMT
Matt Quail wrote:

>> We have a system where I'll be given 10 or 20 unique keys.
> I assume you mean you have one unique-key field, and you are given  
> 10-20 values to find for this one field?
>> Internally I'm creating a new Term and then calling  
>> IndexReader.termDocs() on this term.  Then if  
>> matches then I'll return this document.
> Are you calling reader.termDocs() inside a (tight) loop? It might be  
> better to create one TermEnum, and use "seek". Something like this:

Yes.. this is another approach I was thinking of taking.  I was thinking 
of building up a list of indexes which had a high probability of holding 
the given document and then searching for each of them.

What I'm worried about though is that it would be a bit slower...  I'm 
just going to have to test out different implementations to see....


> I'm pretty sure that will work. And if you can avoid the multi- 
> threading issues, you might try and use the same TermDocs object for  
> as long as possible (that is, move it up out of as many tight loops  
> as you can).

Well... that doesn't look like the biggest overhead.  The bottleneck 
seens to be in seek() and the fact that its using an InputStream to read 
bytes off disk.  I actually tried to speed that up by crainking 
InputSteam.BUFFER_SIZE var higher but that didn't work either though I'm 
not sure if its a caching issue.  I sent an email to the list about this 
earlier but no one responded.

So it seems like my bottleneck is in seek() so It would make sense to 
figure out how to limit this.

Is this O(log(N))  btw or is it O(N) ?



Use Rojo (RSS/Atom aggregator)! - visit 
See #rojo if you want to chat.

Rojo is Hiring! -

   Kevin A. Burton, Location - San Francisco, CA
      AIM/YIM - sfburtonator,  Web -
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message