lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nader Henein <>
Subject Re: Most efficient way to index 14M documents (out of memory/file handles)
Date Wed, 07 Jul 2004 06:24:38 GMT
Here's the thread you want :

Nader Henein

Kevin A. Burton wrote:

> I'm trying to burn an index of 14M documents.
> I have two problems.
> 1.  I have to run optimize() every 50k documents or I run out of file 
> handles.  this takes TIME and of course is linear to the size of the 
> index so it just gets slower by the time I complete.  It starts to 
> crawl at about 3M documents.
> 2.  I eventually will run out of memory in this configuration.
> I KNOW this has been covered before but for the life of me I can't 
> find it in the archives, the FAQ or the wiki.
> I'm using an IndexWriter with a mergeFactor of 5k and then optimizing 
> every 50k documents.
> Does it make sense to just create a new IndexWriter for every 50k docs 
> and then do one big optimize() at the end?
> Kevin

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message