lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Most efficient way to index 14M documents (out of memory/file handles)
Date Wed, 07 Jul 2004 16:54:29 GMT
A mergeFactor of 5000 is a bad idea.  If you want to index faster, try 
increasing minMergeDocs instead.  If you have lots of memory this can 
probably be 5000 or higher.

Also, why do you optimize before you're done?  That only slows things. 
Perhaps you have to do it because you've set mergeFactor to such an 
extreme value?  I do not recommend a merge factor higher than 100.

Doug


Kevin A. Burton wrote:
> I'm trying to burn an index of 14M documents.
> 
> I have two problems.
> 
> 1.  I have to run optimize() every 50k documents or I run out of file 
> handles.  this takes TIME and of course is linear to the size of the 
> index so it just gets slower by the time I complete.  It starts to crawl 
> at about 3M documents.
> 
> 2.  I eventually will run out of memory in this configuration.
> 
> I KNOW this has been covered before but for the life of me I can't find 
> it in the archives, the FAQ or the wiki.
> I'm using an IndexWriter with a mergeFactor of 5k and then optimizing 
> every 50k documents.
> 
> Does it make sense to just create a new IndexWriter for every 50k docs 
> and then do one big optimize() at the end?
> 
> Kevin
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message