lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: URGENT: Help indexing large document set
Date Wed, 24 Nov 2004 08:05:39 GMT
On Wednesday 24 November 2004 00:37, John Wang wrote:
> Hi:
> 
>    I am trying to index 1M documents, with batches of 500 documents.
> 
>    Each document has an unique text key, which is added as a
> Field.KeyWord(name,value).
> 
>    For each batch of 500, I need to make sure I am not adding a
> document with a key that is already in the current index.
> 
>   To do this, I am calling IndexSearcher.docFreq for each document and
> delete the document currently in the index with the same key:
>  
>        while (keyIter.hasNext()) {
>             String objectID = (String) keyIter.next();
>             term = new Term("key", objectID);
>             int count = localSearcher.docFreq(term);

To speed this up a bit make sure that the iterator gives
the terms in sorted order. I'd use an index reader instead
of a searcher, but that will probably not make a difference.

Adding the documents can be done with multiple threads.
Last time I checked that, there was a moderate speed up
using three threads instead of one on a single CPU machine.
Tuning the values of minMergeDocs and maxMergeDocs
may also help to increase performance of adding documents.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message