lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: 2.9 NRT w.r.t. sorting and field cache
Date Tue, 22 Sep 2009 23:57:26 GMT
Oh - yeah - also - youll be passed a segment reader if thats what makes
sense. And sense it does, you will be passed one every time. You can
warm a multireader the same way though, so no reason to pin it down.

Mark Miller wrote:
> Come on dude :) Spend a half ounce of effort first. Mike's time is too
> valuable !
>
> Luckily mine is not.
>
> There is no default impl - the class is dead simple (and the class has
> been pointed out like 3 times in this thread - I'm not even fully
> following and I know where to find it):
>
>   public static abstract class IndexReaderWarmer {
>     public abstract void warm(IndexReader reader) throws IOException;
>   }
>
> Now pass something in that warms the reader. Load a fieldcache - do a
> search. Do the hokey pokey and turn your self around ...
>
> Investigation time: 5 seconds.
>
> John Wang wrote:
>   
>> Hi Michael:
>>
>>      Thanks for the pointer!
>>
>>       Pardon my ignorance, but I am still no seeing the connection
>> between this api to per/segment loading of FieldCache. (the api takes
>> in an IndexReader instead of maybe SegmentReader[])
>>
>>       Can you point me to maybe the default impl of IndexReaderWarmer
>> to help me understand?
>>
>> Thanks
>>
>> -John
>>
>> On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
>> <lucene@mikemccandless.com <mailto:lucene@mikemccandless.com>> wrote:
>>
>>     This is exactly why we added IndexWriter.setMergedSegmentWarmer -- you
>>     can warm the reader w/o blocking ongoing updates.
>>
>>     Mike
>>
>>     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
>>     <markrmiller@gmail.com <mailto:markrmiller@gmail.com>> wrote:
>>     > Right - when a large segment is invalidated, you will have a bigger
>>     > fieldcache piece to reload - pre 2.9, you'd be reloading the *whole*
>>     > field cache every time though. Sounds like you are trying to
>>     deal with
>>     > those large segments changing anyway :) They are always an issue
>>     when
>>     > doing RT it seems.
>>     >
>>     > I don't believe deletes invalidate a field cache - terms from
>>     deleted
>>     > docs stay in a field cache and segmentreaders use their
>>     freqStream as
>>     > the fieldcache key. Only when the deletes are merged out would they
>>     > invalidate - but because your writing a new segment anyway ...
>>     >
>>     > - Mark
>>     >
>>     > John Wang wrote:
>>     >> I understand what you are saying. Let me detail what I am
>>     trying to say:
>>     >>
>>     >> When "currently processed segments" are flushed down, merge may
>>     >> happen. When merges happen, some of those "stable segments" will be
>>     >> invalidated, and so will the fieldcache data keyed by them.
>>     >>
>>     >> In a high update environment, such scenarios can happen quite
>>     often.
>>     >>
>>     >> The way the default mergePolicy works is that small segments get
>>     >> merged into the larger segments. Eventually, what will be
>>     invalidated
>>     >> would be a large segment, and when that happens, a large chunk
>>     of the
>>     >> field cache would be invalidated.
>>     >>
>>     >> Furthermore, in the case where there are high updates, the stable
>>     >> segments can be invalidate much sooner when there are deletes
>>     in those
>>     >> segments, and I would guess the corresponding FieldCache needs
>>     to be
>>     >> adjusted. Not sure how it is handled right now.
>>     >>
>>     >> Just my two cents, and of course when I find the time I will
>>     need to
>>     >> run some tests to see.
>>     >>
>>     >> -John
>>     >>
>>     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <uwe@thetaphi.de
>>     <mailto:uwe@thetaphi.de>
>>     >> <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>> wrote:
>>     >>
>>     >>     The NRT reader coming from the IndexWriter.getReader() has only
>>     >>     changes in the currently processed segments, the other segments
>>     >>     keep stable (and even their IndexReader keys used for the
>>     >>     FieldCache). The rest of the segments keep stable. For the
>>     >>     consumer it looks like a normal reader (it is in fact a
>>     >>     ReadOnlyDirectoryReader) supporting
>>     getSequentialSubReaders() and
>>     >>     so on.
>>     >>
>>     >>
>>     >>
>>     >>     -----
>>     >>     Uwe Schindler
>>     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
>>     >>     http://www.thetaphi.de
>>     >>     eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>>     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>
>>     >>
>>     >>    
>>     ------------------------------------------------------------------------
>>     >>
>>     >>     *From:* John Wang [mailto:john.wang@gmail.com
>>     <mailto:john.wang@gmail.com>
>>     >>     <mailto:john.wang@gmail.com <mailto:john.wang@gmail.com>>]
>>     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
>>     >>     *To:* java-dev@lucene.apache.org
>>     <mailto:java-dev@lucene.apache.org>
>>     <mailto:java-dev@lucene.apache.org
>>     <mailto:java-dev@lucene.apache.org>>
>>     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
>>     >>
>>     >>
>>     >>
>>     >>     Thanks Mark for the pointer!
>>     >>
>>     >>     I guess my point is with NRT, and when segment files change
>>     often,
>>     >>     this would be an issue, no?
>>     >>
>>     >>     Anyway, I can run some tests.
>>     >>
>>     >>     Thanks
>>     >>
>>     >>     -John
>>     >>
>>     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
>>     >>     <markrmiller@gmail.com <mailto:markrmiller@gmail.com>
>>     <mailto:markrmiller@gmail.com <mailto:markrmiller@gmail.com>>>
wrote:
>>     >>
>>     >>     1483 - indexsearcher pulls out a readers subreaders
>>     >>     (segmentreaders) and sends a collector over them one by one,
>>     >>     rather than using the multireader. So only fc for seg
>>     readers that
>>     >>     change need to be reloaded.
>>     >>
>>     >>     - Mark
>>     >>
>>     >>
>>     >>
>>     >>     http://www.lucidimagination.com (mobile)
>>     >>
>>     >>
>>     >>     On Sep 22, 2009, at 1:27 AM, John Wang <john.wang@gmail.com
>>     <mailto:john.wang@gmail.com>
>>     >>     <mailto:john.wang@gmail.com <mailto:john.wang@gmail.com>>>
>>     wrote:
>>     >>
>>     >>>     Hi Yonik:
>>     >>>
>>     >>>          Actually that is what I am looking for. Can you
>>     please point
>>     >>>     me to where/how sorting is done per-segment?
>>     >>>
>>     >>>          When heaving indexing introduces or modifies
>>     segments, would
>>     >>>     it cause reloading of FieldCache at query time and thus would
>>     >>>     impact search performance?
>>     >>>
>>     >>>     thanks
>>     >>>
>>     >>>     -John
>>     >>>
>>     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
>>     >>>     <yonik@lucidimagination.com
>>     <mailto:yonik@lucidimagination.com>
>>     <mailto:yonik@lucidimagination.com
>>     <mailto:yonik@lucidimagination.com>>>
>>     >>>     wrote:
>>     >>>
>>     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
>>     <john.wang@gmail.com <mailto:john.wang@gmail.com>
>>     >>>     <mailto:john.wang@gmail.com <mailto:john.wang@gmail.com>>>
>>     wrote:
>>     >>>     > Looking at the code, seems there is a disconnect between
>>     >>>     how/when field
>>     >>>     > cache is loaded when IndexWriter.getReader() is called.
>>     >>>
>>     >>>     I'm not sure what you mean by "disconnect"
>>     >>>
>>     >>>     > Is FieldCache updated?
>>     >>>
>>     >>>     FieldCache entries are populated on demand, as they always
>>     have been.
>>     >>>
>>     >>>
>>     >>>     > Otherwise, are we reloading FieldCache for each
>>     >>>     > reader instance?
>>     >>>
>>     >>>     Searching/sorting is now per-segment, and so is the use of the
>>     >>>     FieldCache.  Segments that don't change shouldn't have to
>>     reload
>>     >>>     their
>>     >>>     FieldCache entries.
>>     >>>
>>     >>>     -Yonik
>>     >>>     http://www.lucidimagination.com
>>     >>>
>>     >>>    
>>     ---------------------------------------------------------------------
>>     >>>     To unsubscribe, e-mail:
>>     java-dev-unsubscribe@lucene.apache.org
>>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>>     >>>     <mailto:java-dev-unsubscribe@lucene.apache.org
>>     <mailto:java-dev-unsubscribe@lucene.apache.org>>
>>     >>>     For additional commands, e-mail:
>>     java-dev-help@lucene.apache.org
>>     <mailto:java-dev-help@lucene.apache.org>
>>     >>>     <mailto:java-dev-help@lucene.apache.org
>>     <mailto:java-dev-help@lucene.apache.org>>
>>     >>>
>>     >>>
>>     >>>
>>     >>
>>     >>
>>     >>
>>     >
>>     >
>>     > --
>>     > - Mark
>>     >
>>     > http://www.lucidimagination.com
>>     >
>>     >
>>     >
>>     >
>>     >
>>     ---------------------------------------------------------------------
>>     > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>>     > For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     <mailto:java-dev-help@lucene.apache.org>
>>     >
>>     >
>>
>>     ---------------------------------------------------------------------
>>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     <mailto:java-dev-help@lucene.apache.org>
>>
>>
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message