lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nader Henein <...@bayt.net>
Subject Re: Most efficient way to index 14M documents (out of memory/file handles)
Date Wed, 07 Jul 2004 06:24:38 GMT
Here's the thread you want :

http://issues.apache.org/eyebrowse/ReadMsg?listName=lucene-user@jakarta.apache.org&msgId=1722573

Nader Henein

Kevin A. Burton wrote:

> I'm trying to burn an index of 14M documents.
>
> I have two problems.
>
> 1.  I have to run optimize() every 50k documents or I run out of file 
> handles.  this takes TIME and of course is linear to the size of the 
> index so it just gets slower by the time I complete.  It starts to 
> crawl at about 3M documents.
>
> 2.  I eventually will run out of memory in this configuration.
>
> I KNOW this has been covered before but for the life of me I can't 
> find it in the archives, the FAQ or the wiki.
> I'm using an IndexWriter with a mergeFactor of 5k and then optimizing 
> every 50k documents.
>
> Does it make sense to just create a new IndexWriter for every 50k docs 
> and then do one big optimize() at the end?
>
> Kevin
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message