lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: 2.9 NRT w.r.t. sorting and field cache
Date Tue, 22 Sep 2009 23:15:09 GMT
Right - when a large segment is invalidated, you will have a bigger
fieldcache piece to reload - pre 2.9, you'd be reloading the *whole*
field cache every time though. Sounds like you are trying to deal with
those large segments changing anyway :) They are always an issue when
doing RT it seems.

I don't believe deletes invalidate a field cache - terms from deleted
docs stay in a field cache and segmentreaders use their freqStream as
the fieldcache key. Only when the deletes are merged out would they
invalidate - but because your writing a new segment anyway ...

- Mark

John Wang wrote:
> I understand what you are saying. Let me detail what I am trying to say:
>
> When "currently processed segments" are flushed down, merge may
> happen. When merges happen, some of those "stable segments" will be
> invalidated, and so will the fieldcache data keyed by them.
>
> In a high update environment, such scenarios can happen quite often.
>
> The way the default mergePolicy works is that small segments get
> merged into the larger segments. Eventually, what will be invalidated
> would be a large segment, and when that happens, a large chunk of the
> field cache would be invalidated.
>
> Furthermore, in the case where there are high updates, the stable
> segments can be invalidate much sooner when there are deletes in those
> segments, and I would guess the corresponding FieldCache needs to be
> adjusted. Not sure how it is handled right now.
>
> Just my two cents, and of course when I find the time I will need to
> run some tests to see.
>
> -John
>
> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <uwe@thetaphi.de
> <mailto:uwe@thetaphi.de>> wrote:
>
>     The NRT reader coming from the IndexWriter.getReader() has only
>     changes in the currently processed segments, the other segments
>     keep stable (and even their IndexReader keys used for the
>     FieldCache). The rest of the segments keep stable. For the
>     consumer it looks like a normal reader (it is in fact a
>     ReadOnlyDirectoryReader) supporting getSequentialSubReaders() and
>     so on.
>
>      
>
>     -----
>     Uwe Schindler
>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     http://www.thetaphi.de
>     eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>
>     ------------------------------------------------------------------------
>
>     *From:* John Wang [mailto:john.wang@gmail.com
>     <mailto:john.wang@gmail.com>]
>     *Sent:* Tuesday, September 22, 2009 9:32 AM
>     *To:* java-dev@lucene.apache.org <mailto:java-dev@lucene.apache.org>
>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
>
>      
>
>     Thanks Mark for the pointer!
>
>     I guess my point is with NRT, and when segment files change often,
>     this would be an issue, no?
>
>     Anyway, I can run some tests.
>
>     Thanks
>
>     -John
>
>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
>     <markrmiller@gmail.com <mailto:markrmiller@gmail.com>> wrote:
>
>     1483 - indexsearcher pulls out a readers subreaders
>     (segmentreaders) and sends a collector over them one by one,
>     rather than using the multireader. So only fc for seg readers that
>     change need to be reloaded.  
>
>     - Mark
>
>      
>
>     http://www.lucidimagination.com (mobile)
>
>
>     On Sep 22, 2009, at 1:27 AM, John Wang <john.wang@gmail.com
>     <mailto:john.wang@gmail.com>> wrote:
>
>>     Hi Yonik:
>>
>>          Actually that is what I am looking for. Can you please point
>>     me to where/how sorting is done per-segment?
>>
>>          When heaving indexing introduces or modifies segments, would
>>     it cause reloading of FieldCache at query time and thus would
>>     impact search performance?
>>
>>     thanks
>>
>>     -John
>>
>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
>>     <yonik@lucidimagination.com <mailto:yonik@lucidimagination.com>>
>>     wrote:
>>
>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang <john.wang@gmail.com
>>     <mailto:john.wang@gmail.com>> wrote:
>>     > Looking at the code, seems there is a disconnect between
>>     how/when field
>>     > cache is loaded when IndexWriter.getReader() is called.
>>
>>     I'm not sure what you mean by "disconnect"
>>
>>     > Is FieldCache updated?
>>
>>     FieldCache entries are populated on demand, as they always have been.
>>
>>
>>     > Otherwise, are we reloading FieldCache for each
>>     > reader instance?
>>
>>     Searching/sorting is now per-segment, and so is the use of the
>>     FieldCache.  Segments that don't change shouldn't have to reload
>>     their
>>     FieldCache entries.
>>
>>     -Yonik
>>     http://www.lucidimagination.com
>>
>>     ---------------------------------------------------------------------
>>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     <mailto:java-dev-help@lucene.apache.org>
>>
>>      
>>
>      
>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message