lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <>
Subject Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?
Date Tue, 22 Sep 2009 16:01:36 GMT
Right, it allows warming without interrupting obtaining new readers.
I'll update the realtime wiki with this.

Thanks Mike.

On Tue, Sep 22, 2009 at 8:53 AM, Michael McCandless
<> wrote:
> It's not only that the newly merged segments are quickly searchable
> (you could do that with warming outside of IW).
> It's more importantly so that you can continue to add/delete docs,
> flush the segment, open a new NRT reader, and search those changes,
> without waiting for the warming to complete.  You could do many such
> updates all while a large merged segment is being warmed in the BG.
> It decouples merging (which results in no change to the search
> results) from the add/deletes (which do result in changes to the
> search results), so that the warming due to a large merge won't hold
> up the stream of updates.
> I think for any serious NRT app, it's a must.  (Either that or avoid
> ever doing large merges entirely).
> Mike
> On Tue, Sep 22, 2009 at 11:44 AM, Jason Rutherglen
> <> wrote:
>> Adding segment warming to IW is the only way to insure newly
>> merged segments are quickly searchable without the impact
>> brought up by John W regarding queries on new segments being
>> slow when they load field caches.
>> On Tue, Sep 22, 2009 at 8:37 AM, Michael McCandless
>> <> wrote:
>>> On Tue, Sep 22, 2009 at 11:08 AM, Yonik Seeley
>>> <> wrote:
>>>> I'm still not sure I see the reason for complicating the IndexWriter
>>>> with warming... can't this be done just as efficiently (if not more
>>>> efficiently) in user/application space?
>>> It will be less efficient when you warm outside of IndexWriter, ie,
>>> you will necessarily delay the app's net turnaround time on being able
>>> to search newly added/deleted docs.
>>> The whole point of putting optional warming into IndexWriter was so
>>> the segment could be warmed *before* the merge commits the change to
>>> the writer's SegmentInfos.  Any newly opened near-real-timer readers
>>> continue to search the old (merged away) segments, until the warming
>>> completes.
>>> This way the warming of merged segments is independent of making any
>>> newly flushed segments searchable (as long as you use CMS, or any
>>> merge scheduler that uses separate threads for merging).  New segments
>>> can be flushed and then become searchable (with getReader()) even
>>> while the warming is happening.
>>> So... if your merge policy allows large merges, setting a warmer in
>>> the IndexWriter is crucial for minimizing turnaround time.  But, even
>>> once you do that, merging is still IO & CPU intensive, plus IO caches
>>> are unnecessarily flushed (since we can't easily madvise/posix_fadvise
>>> from java), and we have no IO scheduler control to have merging run at
>>> very lower priority, etc., so while the merge & warming are taking
>>> place, search performance will be impacted.
>>> Mike
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message