lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bogdan Ghidireac <bog...@ecstend.com>
Subject Re: IndexWriter.applyDeletes performance
Date Mon, 08 Mar 2010 11:29:48 GMT
Mike,

>
> But... how long does step 2 take?  Is it an option to not commit on
> every update?  How many docs do you typically update?

I do not commit on every update, I call commit once every 10k
documents. Indexing 10k docs takes around 10 secs.


>
> If you are committing only so that an outside reader can reopen, you
> should consider just using an NRT reader instead (assuming the reader
> is in same JVM as IndexWriter).

My service is just an indexer, I don't need a reader. The new segments
are pushed to a searcher box after each commit.


>
> Roughly how much more RAM consumption do you see when you force pooling?

pooling not forced -> memory after explicit GC: 50 MB
pooling forced -> memory after explicit GC: 250MB

Thank you for opening the JIRA issue.

Bogdan


>
> Mike
>
> On Fri, Mar 5, 2010 at 9:18 AM, Bogdan Ghidireac <bogdan@ecstend.com> wrote:
>> Hi,
>>
>> I have an index with 100 million docs that has around 20GB on disk and
>> an update rate of few hundred docs per minute. The new docs are
>> grouped in batches and indexed once every few minutes. My problem is
>> that the update performance degraded too much over time as the index
>> increased in size (distinct docs).
>>
>> My indexing flow looks like this ..
>>
>> 0. create indexWriter (only once)
>> 1. get the open indexWriter
>> 2. for each doc call indexWriter.updateDocument(pkTerm, doc)
>> 3. indexWriter.commit
>> 4. indexWriter.waitForMerges
>> 5. wait for new docs and goto 1.
>>
>> I ran a profiler for several minutes and I noticed that most of the
>> time the indexer is busy applying the deletes. This takes so much time
>> because all terms are loaded for every commit (see the attached
>> profiler screenshot).
>>
>> The index writer has a pool or readers but they are not used unless
>> near real time is enabled. I changed my code to force the pool to be
>> used but the only way I can do this is to request a reader that is
>> never used writer.getReader(). Of course, the memory consumption is
>> higher now because I have terms in memory but the steps 3+4 compete in
>> 1-2 secs compared to 8-10 secs.
>>
>> Is is possible to enable the readers pool at the IndexWriter
>> constructor level? My current method looks like a hack ...
>> I am using Lucene 2.9.2. on Linux.
>>
>> Regards,
>> Bogdan
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message