lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: URGENT: Help indexing large document set
Date Wed, 24 Nov 2004 08:05:39 GMT
On Wednesday 24 November 2004 00:37, John Wang wrote:
> Hi:
>    I am trying to index 1M documents, with batches of 500 documents.
>    Each document has an unique text key, which is added as a
> Field.KeyWord(name,value).
>    For each batch of 500, I need to make sure I am not adding a
> document with a key that is already in the current index.
>   To do this, I am calling IndexSearcher.docFreq for each document and
> delete the document currently in the index with the same key:
>        while (keyIter.hasNext()) {
>             String objectID = (String);
>             term = new Term("key", objectID);
>             int count = localSearcher.docFreq(term);

To speed this up a bit make sure that the iterator gives
the terms in sorted order. I'd use an index reader instead
of a searcher, but that will probably not make a difference.

Adding the documents can be done with multiple threads.
Last time I checked that, there was a moderate speed up
using three threads instead of one on a single CPU machine.
Tuning the values of minMergeDocs and maxMergeDocs
may also help to increase performance of adding documents.

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message