lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris McGee <cbmc...@ca.ibm.com>
Subject How to improve performance of large numbers of successive searches?
Date Thu, 10 Apr 2008 18:34:32 GMT
Hello,

I am building fairly large directories (200-500 MB of disk space) using 
lucene-java. Sometimes it can take upwards of 10-15 mins to create the 
documents and write them to disk using my current configuration. I have 
upgraded to the latest 2.3.1 version and followed many of the 
recommendations offered on the wiki: 

http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

These tips have significantly improved the time to build the directory and 
search it. However, I have noticed that when I perform term queries using 
a searcher many times in rapid succession and iterate over all of the hits 
it can take a significant time. To perform 1000 term query searches each 
with around 2000 hits it takes well over a minute. The time seems to vary 
linearly based on the number of searches (ie. 10 times more searches take 
10 times longer). I tried combining the searches into a BooleanQuery but 
it only shaves off a small percentage (5-10%) of the total time.

I was wondering if there is a faster way to retrieve all of the results 
for my large collections of terms without using more memory and without 
taking more time to build the directory? I already looked at bypassing the 
searcher and using the IndexReader.termDocs() method directly to retrieve 
the documents but there did not seem to be much performance improvement. 
In the majority of my cases I am simplying looking for a large number of 
values to the same field. Also, I'm not interested in scoring results 
based on frequency or weights I need to retrieve all of the results 
anyway.

Any help with this would be great.

Thanks,
Chris McGee
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message