lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?
Date Tue, 22 Sep 2009 15:53:40 GMT
It's not only that the newly merged segments are quickly searchable
(you could do that with warming outside of IW).

It's more importantly so that you can continue to add/delete docs,
flush the segment, open a new NRT reader, and search those changes,
without waiting for the warming to complete.  You could do many such
updates all while a large merged segment is being warmed in the BG.

It decouples merging (which results in no change to the search
results) from the add/deletes (which do result in changes to the
search results), so that the warming due to a large merge won't hold
up the stream of updates.

I think for any serious NRT app, it's a must.  (Either that or avoid
ever doing large merges entirely).

Mike

On Tue, Sep 22, 2009 at 11:44 AM, Jason Rutherglen
<jason.rutherglen@gmail.com> wrote:
> Adding segment warming to IW is the only way to insure newly
> merged segments are quickly searchable without the impact
> brought up by John W regarding queries on new segments being
> slow when they load field caches.
>
> On Tue, Sep 22, 2009 at 8:37 AM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>> On Tue, Sep 22, 2009 at 11:08 AM, Yonik Seeley
>> <yonik@lucidimagination.com> wrote:
>>
>>> I'm still not sure I see the reason for complicating the IndexWriter
>>> with warming... can't this be done just as efficiently (if not more
>>> efficiently) in user/application space?
>>
>> It will be less efficient when you warm outside of IndexWriter, ie,
>> you will necessarily delay the app's net turnaround time on being able
>> to search newly added/deleted docs.
>>
>> The whole point of putting optional warming into IndexWriter was so
>> the segment could be warmed *before* the merge commits the change to
>> the writer's SegmentInfos.  Any newly opened near-real-timer readers
>> continue to search the old (merged away) segments, until the warming
>> completes.
>>
>> This way the warming of merged segments is independent of making any
>> newly flushed segments searchable (as long as you use CMS, or any
>> merge scheduler that uses separate threads for merging).  New segments
>> can be flushed and then become searchable (with getReader()) even
>> while the warming is happening.
>>
>> So... if your merge policy allows large merges, setting a warmer in
>> the IndexWriter is crucial for minimizing turnaround time.  But, even
>> once you do that, merging is still IO & CPU intensive, plus IO caches
>> are unnecessarily flushed (since we can't easily madvise/posix_fadvise
>> from java), and we have no IO scheduler control to have merging run at
>> very lower priority, etc., so while the merge & warming are taking
>> place, search performance will be impacted.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message