lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: 2.9 NRT w.r.t. sorting and field cache
Date Tue, 22 Sep 2009 23:17:38 GMT
This is exactly why we added IndexWriter.setMergedSegmentWarmer -- you
can warm the reader w/o blocking ongoing updates.

Mike

On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller <markrmiller@gmail.com> wrote:
> Right - when a large segment is invalidated, you will have a bigger
> fieldcache piece to reload - pre 2.9, you'd be reloading the *whole*
> field cache every time though. Sounds like you are trying to deal with
> those large segments changing anyway :) They are always an issue when
> doing RT it seems.
>
> I don't believe deletes invalidate a field cache - terms from deleted
> docs stay in a field cache and segmentreaders use their freqStream as
> the fieldcache key. Only when the deletes are merged out would they
> invalidate - but because your writing a new segment anyway ...
>
> - Mark
>
> John Wang wrote:
>> I understand what you are saying. Let me detail what I am trying to say:
>>
>> When "currently processed segments" are flushed down, merge may
>> happen. When merges happen, some of those "stable segments" will be
>> invalidated, and so will the fieldcache data keyed by them.
>>
>> In a high update environment, such scenarios can happen quite often.
>>
>> The way the default mergePolicy works is that small segments get
>> merged into the larger segments. Eventually, what will be invalidated
>> would be a large segment, and when that happens, a large chunk of the
>> field cache would be invalidated.
>>
>> Furthermore, in the case where there are high updates, the stable
>> segments can be invalidate much sooner when there are deletes in those
>> segments, and I would guess the corresponding FieldCache needs to be
>> adjusted. Not sure how it is handled right now.
>>
>> Just my two cents, and of course when I find the time I will need to
>> run some tests to see.
>>
>> -John
>>
>> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <uwe@thetaphi.de
>> <mailto:uwe@thetaphi.de>> wrote:
>>
>>     The NRT reader coming from the IndexWriter.getReader() has only
>>     changes in the currently processed segments, the other segments
>>     keep stable (and even their IndexReader keys used for the
>>     FieldCache). The rest of the segments keep stable. For the
>>     consumer it looks like a normal reader (it is in fact a
>>     ReadOnlyDirectoryReader) supporting getSequentialSubReaders() and
>>     so on.
>>
>>
>>
>>     -----
>>     Uwe Schindler
>>     H.-H.-Meier-Allee 63, D-28213 Bremen
>>     http://www.thetaphi.de
>>     eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>>
>>     ------------------------------------------------------------------------
>>
>>     *From:* John Wang [mailto:john.wang@gmail.com
>>     <mailto:john.wang@gmail.com>]
>>     *Sent:* Tuesday, September 22, 2009 9:32 AM
>>     *To:* java-dev@lucene.apache.org <mailto:java-dev@lucene.apache.org>
>>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
>>
>>
>>
>>     Thanks Mark for the pointer!
>>
>>     I guess my point is with NRT, and when segment files change often,
>>     this would be an issue, no?
>>
>>     Anyway, I can run some tests.
>>
>>     Thanks
>>
>>     -John
>>
>>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
>>     <markrmiller@gmail.com <mailto:markrmiller@gmail.com>> wrote:
>>
>>     1483 - indexsearcher pulls out a readers subreaders
>>     (segmentreaders) and sends a collector over them one by one,
>>     rather than using the multireader. So only fc for seg readers that
>>     change need to be reloaded.
>>
>>     - Mark
>>
>>
>>
>>     http://www.lucidimagination.com (mobile)
>>
>>
>>     On Sep 22, 2009, at 1:27 AM, John Wang <john.wang@gmail.com
>>     <mailto:john.wang@gmail.com>> wrote:
>>
>>>     Hi Yonik:
>>>
>>>          Actually that is what I am looking for. Can you please point
>>>     me to where/how sorting is done per-segment?
>>>
>>>          When heaving indexing introduces or modifies segments, would
>>>     it cause reloading of FieldCache at query time and thus would
>>>     impact search performance?
>>>
>>>     thanks
>>>
>>>     -John
>>>
>>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
>>>     <yonik@lucidimagination.com <mailto:yonik@lucidimagination.com>>
>>>     wrote:
>>>
>>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang <john.wang@gmail.com
>>>     <mailto:john.wang@gmail.com>> wrote:
>>>     > Looking at the code, seems there is a disconnect between
>>>     how/when field
>>>     > cache is loaded when IndexWriter.getReader() is called.
>>>
>>>     I'm not sure what you mean by "disconnect"
>>>
>>>     > Is FieldCache updated?
>>>
>>>     FieldCache entries are populated on demand, as they always have been.
>>>
>>>
>>>     > Otherwise, are we reloading FieldCache for each
>>>     > reader instance?
>>>
>>>     Searching/sorting is now per-segment, and so is the use of the
>>>     FieldCache.  Segments that don't change shouldn't have to reload
>>>     their
>>>     FieldCache entries.
>>>
>>>     -Yonik
>>>     http://www.lucidimagination.com
>>>
>>>     ---------------------------------------------------------------------
>>>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>>>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>     <mailto:java-dev-help@lucene.apache.org>
>>>
>>>
>>>
>>
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message