lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Armbrust <daniel.armbrust.l...@gmail.com>
Subject Re: Any problems with a failed IndexWriter optimize call?
Date Mon, 01 Aug 2005 20:34:01 GMT
May I suggest:

Don't call optimize.  You don't need it.  Here is my approach: 

Keep each one of your 250,000 document indexes separate - so run your 
batch, build the index, and then just close it.  Don't try to optimize 
it.  For each 250,000 document batch, just put it into a different folder.

Now, when you have finished building your entire index, you will have a 
bunch of different unoptimized lucene indexes.  Open up a new, blank 
index, and merge all of your other indexes into this one.  The end 
result will be a single large (already optimized) index.


This approach has several benefits -
You can keep the parameters set in such a way that it performs better 
while indexing (without running into the out of file handles issues)
If a failure occurs, you only have to redo the batch, not start over the 
entire process.
You don't have unnecessary IO, but constantly rewriting your data with 
optimize() calls.
You can very easily break up the indexing across multiple machines.
If a failure occurs while trying to merge all of the indexes together, 
you don't lose anything - as you are only reading the existing indexes.  
You know they will all still be valid.

I actually wrote a wrapper for Lucene that does all of this under the 
covers.  At some point, I should get it released open source :)

Dan

-- 
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message