lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: 2.9 NRT w.r.t. sorting and field cache
Date Tue, 22 Sep 2009 23:55:33 GMT
Come on dude :) Spend a half ounce of effort first. Mike's time is too
valuable !

Luckily mine is not.

There is no default impl - the class is dead simple (and the class has
been pointed out like 3 times in this thread - I'm not even fully
following and I know where to find it):

  public static abstract class IndexReaderWarmer {
    public abstract void warm(IndexReader reader) throws IOException;
  }

Now pass something in that warms the reader. Load a fieldcache - do a
search. Do the hokey pokey and turn your self around ...

Investigation time: 5 seconds.

John Wang wrote:
> Hi Michael:
>
>      Thanks for the pointer!
>
>       Pardon my ignorance, but I am still no seeing the connection
> between this api to per/segment loading of FieldCache. (the api takes
> in an IndexReader instead of maybe SegmentReader[])
>
>       Can you point me to maybe the default impl of IndexReaderWarmer
> to help me understand?
>
> Thanks
>
> -John
>
> On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
> <lucene@mikemccandless.com <mailto:lucene@mikemccandless.com>> wrote:
>
>     This is exactly why we added IndexWriter.setMergedSegmentWarmer -- you
>     can warm the reader w/o blocking ongoing updates.
>
>     Mike
>
>     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
>     <markrmiller@gmail.com <mailto:markrmiller@gmail.com>> wrote:
>     > Right - when a large segment is invalidated, you will have a bigger
>     > fieldcache piece to reload - pre 2.9, you'd be reloading the *whole*
>     > field cache every time though. Sounds like you are trying to
>     deal with
>     > those large segments changing anyway :) They are always an issue
>     when
>     > doing RT it seems.
>     >
>     > I don't believe deletes invalidate a field cache - terms from
>     deleted
>     > docs stay in a field cache and segmentreaders use their
>     freqStream as
>     > the fieldcache key. Only when the deletes are merged out would they
>     > invalidate - but because your writing a new segment anyway ...
>     >
>     > - Mark
>     >
>     > John Wang wrote:
>     >> I understand what you are saying. Let me detail what I am
>     trying to say:
>     >>
>     >> When "currently processed segments" are flushed down, merge may
>     >> happen. When merges happen, some of those "stable segments" will be
>     >> invalidated, and so will the fieldcache data keyed by them.
>     >>
>     >> In a high update environment, such scenarios can happen quite
>     often.
>     >>
>     >> The way the default mergePolicy works is that small segments get
>     >> merged into the larger segments. Eventually, what will be
>     invalidated
>     >> would be a large segment, and when that happens, a large chunk
>     of the
>     >> field cache would be invalidated.
>     >>
>     >> Furthermore, in the case where there are high updates, the stable
>     >> segments can be invalidate much sooner when there are deletes
>     in those
>     >> segments, and I would guess the corresponding FieldCache needs
>     to be
>     >> adjusted. Not sure how it is handled right now.
>     >>
>     >> Just my two cents, and of course when I find the time I will
>     need to
>     >> run some tests to see.
>     >>
>     >> -John
>     >>
>     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <uwe@thetaphi.de
>     <mailto:uwe@thetaphi.de>
>     >> <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>> wrote:
>     >>
>     >>     The NRT reader coming from the IndexWriter.getReader() has only
>     >>     changes in the currently processed segments, the other segments
>     >>     keep stable (and even their IndexReader keys used for the
>     >>     FieldCache). The rest of the segments keep stable. For the
>     >>     consumer it looks like a normal reader (it is in fact a
>     >>     ReadOnlyDirectoryReader) supporting
>     getSequentialSubReaders() and
>     >>     so on.
>     >>
>     >>
>     >>
>     >>     -----
>     >>     Uwe Schindler
>     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     >>     http://www.thetaphi.de
>     >>     eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>
>     >>
>     >>    
>     ------------------------------------------------------------------------
>     >>
>     >>     *From:* John Wang [mailto:john.wang@gmail.com
>     <mailto:john.wang@gmail.com>
>     >>     <mailto:john.wang@gmail.com <mailto:john.wang@gmail.com>>]
>     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
>     >>     *To:* java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org>
>     <mailto:java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org>>
>     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
>     >>
>     >>
>     >>
>     >>     Thanks Mark for the pointer!
>     >>
>     >>     I guess my point is with NRT, and when segment files change
>     often,
>     >>     this would be an issue, no?
>     >>
>     >>     Anyway, I can run some tests.
>     >>
>     >>     Thanks
>     >>
>     >>     -John
>     >>
>     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
>     >>     <markrmiller@gmail.com <mailto:markrmiller@gmail.com>
>     <mailto:markrmiller@gmail.com <mailto:markrmiller@gmail.com>>> wrote:
>     >>
>     >>     1483 - indexsearcher pulls out a readers subreaders
>     >>     (segmentreaders) and sends a collector over them one by one,
>     >>     rather than using the multireader. So only fc for seg
>     readers that
>     >>     change need to be reloaded.
>     >>
>     >>     - Mark
>     >>
>     >>
>     >>
>     >>     http://www.lucidimagination.com (mobile)
>     >>
>     >>
>     >>     On Sep 22, 2009, at 1:27 AM, John Wang <john.wang@gmail.com
>     <mailto:john.wang@gmail.com>
>     >>     <mailto:john.wang@gmail.com <mailto:john.wang@gmail.com>>>
>     wrote:
>     >>
>     >>>     Hi Yonik:
>     >>>
>     >>>          Actually that is what I am looking for. Can you
>     please point
>     >>>     me to where/how sorting is done per-segment?
>     >>>
>     >>>          When heaving indexing introduces or modifies
>     segments, would
>     >>>     it cause reloading of FieldCache at query time and thus would
>     >>>     impact search performance?
>     >>>
>     >>>     thanks
>     >>>
>     >>>     -John
>     >>>
>     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
>     >>>     <yonik@lucidimagination.com
>     <mailto:yonik@lucidimagination.com>
>     <mailto:yonik@lucidimagination.com
>     <mailto:yonik@lucidimagination.com>>>
>     >>>     wrote:
>     >>>
>     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
>     <john.wang@gmail.com <mailto:john.wang@gmail.com>
>     >>>     <mailto:john.wang@gmail.com <mailto:john.wang@gmail.com>>>
>     wrote:
>     >>>     > Looking at the code, seems there is a disconnect between
>     >>>     how/when field
>     >>>     > cache is loaded when IndexWriter.getReader() is called.
>     >>>
>     >>>     I'm not sure what you mean by "disconnect"
>     >>>
>     >>>     > Is FieldCache updated?
>     >>>
>     >>>     FieldCache entries are populated on demand, as they always
>     have been.
>     >>>
>     >>>
>     >>>     > Otherwise, are we reloading FieldCache for each
>     >>>     > reader instance?
>     >>>
>     >>>     Searching/sorting is now per-segment, and so is the use of the
>     >>>     FieldCache.  Segments that don't change shouldn't have to
>     reload
>     >>>     their
>     >>>     FieldCache entries.
>     >>>
>     >>>     -Yonik
>     >>>     http://www.lucidimagination.com
>     >>>
>     >>>    
>     ---------------------------------------------------------------------
>     >>>     To unsubscribe, e-mail:
>     java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>     >>>     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>>
>     >>>     For additional commands, e-mail:
>     java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>
>     >>>     <mailto:java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>>
>     >>>
>     >>>
>     >>>
>     >>
>     >>
>     >>
>     >
>     >
>     > --
>     > - Mark
>     >
>     > http://www.lucidimagination.com
>     >
>     >
>     >
>     >
>     >
>     ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>     > For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>
>     >
>     >
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>
>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message