lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bogdan Ghidireac <>
Subject IndexWriter.applyDeletes performance
Date Fri, 05 Mar 2010 14:18:21 GMT

I have an index with 100 million docs that has around 20GB on disk and
an update rate of few hundred docs per minute. The new docs are
grouped in batches and indexed once every few minutes. My problem is
that the update performance degraded too much over time as the index
increased in size (distinct docs).

My indexing flow looks like this ..

0. create indexWriter (only once)
1. get the open indexWriter
2. for each doc call indexWriter.updateDocument(pkTerm, doc)
3. indexWriter.commit
4. indexWriter.waitForMerges
5. wait for new docs and goto 1.

I ran a profiler for several minutes and I noticed that most of the
time the indexer is busy applying the deletes. This takes so much time
because all terms are loaded for every commit (see the attached
profiler screenshot).

The index writer has a pool or readers but they are not used unless
near real time is enabled. I changed my code to force the pool to be
used but the only way I can do this is to request a reader that is
never used writer.getReader(). Of course, the memory consumption is
higher now because I have terms in memory but the steps 3+4 compete in
1-2 secs compared to 8-10 secs.

Is is possible to enable the readers pool at the IndexWriter
constructor level? My current method looks like a hack ...
I am using Lucene 2.9.2. on Linux.


View raw message