lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexWriter.applyDeletes performance
Date Fri, 05 Mar 2010 15:25:27 GMT
Currently you can't tell IW to use the pool (ie, pool is only enabled
if you use NRT readers).  We should probably make this an option at
ctor time, for situations like this.  (In fact, in followon
discussions about further improvements to NRT we've already discussed
having such an option to IW's ctors).  I'll open an issue for this.

Indeed from that profiler output it looks like most of the time is
being spent opening the SegmentReaders (to do deletes), specifically
loading the terms dict index (64% overall) and loading the deleted
docs (10%).

But... how long does step 2 take?  Is it an option to not commit on
every update?  How many docs do you typically update?

If you are committing only so that an outside reader can reopen, you
should consider just using an NRT reader instead (assuming the reader
is in same JVM as IndexWriter).

Roughly how much more RAM consumption do you see when you force pooling?

Mike

On Fri, Mar 5, 2010 at 9:18 AM, Bogdan Ghidireac <bogdan@ecstend.com> wrote:
> Hi,
>
> I have an index with 100 million docs that has around 20GB on disk and
> an update rate of few hundred docs per minute. The new docs are
> grouped in batches and indexed once every few minutes. My problem is
> that the update performance degraded too much over time as the index
> increased in size (distinct docs).
>
> My indexing flow looks like this ..
>
> 0. create indexWriter (only once)
> 1. get the open indexWriter
> 2. for each doc call indexWriter.updateDocument(pkTerm, doc)
> 3. indexWriter.commit
> 4. indexWriter.waitForMerges
> 5. wait for new docs and goto 1.
>
> I ran a profiler for several minutes and I noticed that most of the
> time the indexer is busy applying the deletes. This takes so much time
> because all terms are loaded for every commit (see the attached
> profiler screenshot).
>
> The index writer has a pool or readers but they are not used unless
> near real time is enabled. I changed my code to force the pool to be
> used but the only way I can do this is to request a reader that is
> never used writer.getReader(). Of course, the memory consumption is
> higher now because I have terms in memory but the steps 3+4 compete in
> 1-2 secs compared to 8-10 secs.
>
> Is is possible to enable the readers pool at the IndexWriter
> constructor level? My current method looks like a hack ...
> I am using Lucene 2.9.2. on Linux.
>
> Regards,
> Bogdan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message