lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexWriter.updateDocument performance improvement
Date Fri, 20 Nov 2009 17:11:51 GMT
Opened LUCENE-2086.

Mike

On Fri, Nov 20, 2009 at 9:43 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> +1
>
> I'll open an issue.
>
> Mike
>
> On Fri, Nov 20, 2009 at 8:11 AM, Yonik Seeley
> <yonik@lucidimagination.com> wrote:
>> Thanks Bogdan, I've been meaning to bring this up.
>> Solr used a TreeMap in the past (when it handled it's own deletes) for
>> the same exact reason.  In my profiling, I've also seen applyDeletes()
>> taking the bulk of the time with small/simple document indexing.
>>
>> So we should definitely go in sorted order (either via TreeMap or sort
>> the HashMap).
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> On Fri, Nov 20, 2009 at 7:21 AM, Bogdan Ghidireac <bogdan@ecstend.com> wrote:
>>> Hi,
>>>
>>> One of the use case of my application involves updating the index with
>>> 10 to 10k docs every few minutes. Because we maintain a PK for each
>>> doc we have to use IndexWriter.updateDocument to be consistent.
>>>
>>> The average time for an update when we commit every 10k docs is around
>>> 17ms (the IndexWriter buffer is 100MB). I profiled the application for
>>> several hours and I noticed that most of the time is spent in
>>> IndexWriter.applyDeletes()->TermDocs.seek(). I changed the
>>> BufferedDeletes.terms from HashMap to TreeMap to have the terms
>>> ordered and to reduce the number of random seeks on the disk.
>>>
>>> I run my tests again with the patched Lucene 2.9.1 and the time has
>>> dropped from 17ms to 2ms. The index has 18GB and 70 million docs.
>>>
>>> I cannot send a patch because my company has some strict and time
>>> consuming policies about open source but the change is small and can
>>> be applied easily.
>>>
>>> Regards,
>>> Bogdan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message