lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Indexing slows down considerably after a few million documents
Date Fri, 27 Oct 2006 14:58:18 GMT
Hi Mekin,

A couple of things:
- You might try increasing maxBufferedDocs to 1000 or so (depending on
your document size).  That controls the size of the smallest segments
and will decrease the numer of merges you end up doing.
- Try using the trunk lucene version... it has indexing enhancements
like avoiding some counting overhead when deciding if segments should
be merged.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server


On 10/27/06, Mekin Maheshwari <mekin.m@gmail.com> wrote:
> I am creating an index of about 7 Million documents.
> The total size of the index is about 2.7G once indexing is done.
>
> For the 1st 3Million documents, the indexer takes about 3 hours (can i
> get better than this? )
>  - 4 seconds per thousand documents
>
> After this it slows down terribly and takes about 20 seconds for every
> thousand documents.
>
>
> It doesnt seem to be a data issue, as I tried starting to create the
> index from the 3Millionth document & I get the initial speed (1k docs
> in 3 secs)
>
> What could be going wrong?
> I have tried a few things, but cant quickly try out a lot of things as
> it takes 3 hours before the slow down happens.
>
> Below is the relevant pieces of the code::
>
> Thanks for the help,
> mekin
>
> IndexWriter writer = new IndexWriter(dirname, analyzer,doTotalIndex );
> writer.setMergeFactor(1000);
>
> while(more records to get){
>  Document doc = new Document();
> //... add fields to doc
> //add boost to doc
> //
>  writer.addDocument(doc);
> }
>
> writer.optimize();
> writer.close();

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message