lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bogdan Ghidireac <bog...@ecstend.com>
Subject IndexWriter.updateDocument performance improvement
Date Fri, 20 Nov 2009 12:21:56 GMT
Hi,

One of the use case of my application involves updating the index with
10 to 10k docs every few minutes. Because we maintain a PK for each
doc we have to use IndexWriter.updateDocument to be consistent.

The average time for an update when we commit every 10k docs is around
17ms (the IndexWriter buffer is 100MB). I profiled the application for
several hours and I noticed that most of the time is spent in
IndexWriter.applyDeletes()->TermDocs.seek(). I changed the
BufferedDeletes.terms from HashMap to TreeMap to have the terms
ordered and to reduce the number of random seeks on the disk.

I run my tests again with the patched Lucene 2.9.1 and the time has
dropped from 17ms to 2ms. The index has 18GB and 70 million docs.

I cannot send a patch because my company has some strict and time
consuming policies about open source but the change is small and can
be applied easily.

Regards,
Bogdan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message