lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <>
Subject Fastest way to fetch N documents with unique keys within large numbers of indexes..
Date Tue, 07 Jun 2005 01:41:05 GMT

I'm trying to figure out the FASTEST way to solve this problem.

We have a system where I'll be given 10 or 20 unique keys.  Which are 
stored as non-tokenized fields within Lucene.  Each key represents a 
unique document.

Internally I'm creating a new Term and then calling 
IndexReader.termDocs() on this term.  Then if matches 
then I'll return this document.

The problem is that this doesn't work very fast either.  This is not an 
academic debate as I've put the system in a profiler and Lucene is the 
top bottleneck (by far).

I don't think there's anything faster than this right?  Could I maybe 
cache a TermEnum and keep it as a pointer to the FIRST field for these 
IDs and then reuse that?  This might allow me to search faster to the 
start of my terms?

Does Lucene internally do a binary search for my term?

I could of course do an index merge of all this content but thats a 
separate problem.  We have a lot of indexes and often have more than 40 
and constantly merging these into a multigig index just takes FOREVER.

It seems that internally IO is the problem. I'm about as fast on IO as I 
can get as I'm on a SCSI RAID array at RAID0 on FAST scsi disks...  I 
also tried tweaking InputStream.BUFFER_SIZE with no visible change in 



Use Rojo (RSS/Atom aggregator)! - visit 
See #rojo if you want to chat.

Rojo is Hiring! -

   Kevin A. Burton, Location - San Francisco, CA
      AIM/YIM - sfburtonator,  Web -
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message