lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bogdan Ghidireac <>
Subject Re: IndexWriter.applyDeletes performance
Date Mon, 08 Mar 2010 11:29:48 GMT

> But... how long does step 2 take?  Is it an option to not commit on
> every update?  How many docs do you typically update?

I do not commit on every update, I call commit once every 10k
documents. Indexing 10k docs takes around 10 secs.

> If you are committing only so that an outside reader can reopen, you
> should consider just using an NRT reader instead (assuming the reader
> is in same JVM as IndexWriter).

My service is just an indexer, I don't need a reader. The new segments
are pushed to a searcher box after each commit.

> Roughly how much more RAM consumption do you see when you force pooling?

pooling not forced -> memory after explicit GC: 50 MB
pooling forced -> memory after explicit GC: 250MB

Thank you for opening the JIRA issue.


> Mike
> On Fri, Mar 5, 2010 at 9:18 AM, Bogdan Ghidireac <> wrote:
>> Hi,
>> I have an index with 100 million docs that has around 20GB on disk and
>> an update rate of few hundred docs per minute. The new docs are
>> grouped in batches and indexed once every few minutes. My problem is
>> that the update performance degraded too much over time as the index
>> increased in size (distinct docs).
>> My indexing flow looks like this ..
>> 0. create indexWriter (only once)
>> 1. get the open indexWriter
>> 2. for each doc call indexWriter.updateDocument(pkTerm, doc)
>> 3. indexWriter.commit
>> 4. indexWriter.waitForMerges
>> 5. wait for new docs and goto 1.
>> I ran a profiler for several minutes and I noticed that most of the
>> time the indexer is busy applying the deletes. This takes so much time
>> because all terms are loaded for every commit (see the attached
>> profiler screenshot).
>> The index writer has a pool or readers but they are not used unless
>> near real time is enabled. I changed my code to force the pool to be
>> used but the only way I can do this is to request a reader that is
>> never used writer.getReader(). Of course, the memory consumption is
>> higher now because I have terms in memory but the steps 3+4 compete in
>> 1-2 secs compared to 8-10 secs.
>> Is is possible to enable the readers pool at the IndexWriter
>> constructor level? My current method looks like a hack ...
>> I am using Lucene 2.9.2. on Linux.
>> Regards,
>> Bogdan
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message