lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <john.w...@gmail.com>
Subject Re: 2.9 NRT w.r.t. sorting and field cache
Date Tue, 22 Sep 2009 23:43:29 GMT
Hi Michael:

     Thanks for the pointer!

      Pardon my ignorance, but I am still no seeing the connection between
this api to per/segment loading of FieldCache. (the api takes in an
IndexReader instead of maybe SegmentReader[])

      Can you point me to maybe the default impl of IndexReaderWarmer to
help me understand?

Thanks

-John

On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> This is exactly why we added IndexWriter.setMergedSegmentWarmer -- you
> can warm the reader w/o blocking ongoing updates.
>
> Mike
>
> On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller <markrmiller@gmail.com>
> wrote:
> > Right - when a large segment is invalidated, you will have a bigger
> > fieldcache piece to reload - pre 2.9, you'd be reloading the *whole*
> > field cache every time though. Sounds like you are trying to deal with
> > those large segments changing anyway :) They are always an issue when
> > doing RT it seems.
> >
> > I don't believe deletes invalidate a field cache - terms from deleted
> > docs stay in a field cache and segmentreaders use their freqStream as
> > the fieldcache key. Only when the deletes are merged out would they
> > invalidate - but because your writing a new segment anyway ...
> >
> > - Mark
> >
> > John Wang wrote:
> >> I understand what you are saying. Let me detail what I am trying to say:
> >>
> >> When "currently processed segments" are flushed down, merge may
> >> happen. When merges happen, some of those "stable segments" will be
> >> invalidated, and so will the fieldcache data keyed by them.
> >>
> >> In a high update environment, such scenarios can happen quite often.
> >>
> >> The way the default mergePolicy works is that small segments get
> >> merged into the larger segments. Eventually, what will be invalidated
> >> would be a large segment, and when that happens, a large chunk of the
> >> field cache would be invalidated.
> >>
> >> Furthermore, in the case where there are high updates, the stable
> >> segments can be invalidate much sooner when there are deletes in those
> >> segments, and I would guess the corresponding FieldCache needs to be
> >> adjusted. Not sure how it is handled right now.
> >>
> >> Just my two cents, and of course when I find the time I will need to
> >> run some tests to see.
> >>
> >> -John
> >>
> >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <uwe@thetaphi.de
> >> <mailto:uwe@thetaphi.de>> wrote:
> >>
> >>     The NRT reader coming from the IndexWriter.getReader() has only
> >>     changes in the currently processed segments, the other segments
> >>     keep stable (and even their IndexReader keys used for the
> >>     FieldCache). The rest of the segments keep stable. For the
> >>     consumer it looks like a normal reader (it is in fact a
> >>     ReadOnlyDirectoryReader) supporting getSequentialSubReaders() and
> >>     so on.
> >>
> >>
> >>
> >>     -----
> >>     Uwe Schindler
> >>     H.-H.-Meier-Allee 63, D-28213 Bremen
> >>     http://www.thetaphi.de
> >>     eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
> >>
> >>
> ------------------------------------------------------------------------
> >>
> >>     *From:* John Wang [mailto:john.wang@gmail.com
> >>     <mailto:john.wang@gmail.com>]
> >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
> >>     *To:* java-dev@lucene.apache.org <mailto:java-dev@lucene.apache.org
> >
> >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
> >>
> >>
> >>
> >>     Thanks Mark for the pointer!
> >>
> >>     I guess my point is with NRT, and when segment files change often,
> >>     this would be an issue, no?
> >>
> >>     Anyway, I can run some tests.
> >>
> >>     Thanks
> >>
> >>     -John
> >>
> >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
> >>     <markrmiller@gmail.com <mailto:markrmiller@gmail.com>> wrote:
> >>
> >>     1483 - indexsearcher pulls out a readers subreaders
> >>     (segmentreaders) and sends a collector over them one by one,
> >>     rather than using the multireader. So only fc for seg readers that
> >>     change need to be reloaded.
> >>
> >>     - Mark
> >>
> >>
> >>
> >>     http://www.lucidimagination.com (mobile)
> >>
> >>
> >>     On Sep 22, 2009, at 1:27 AM, John Wang <john.wang@gmail.com
> >>     <mailto:john.wang@gmail.com>> wrote:
> >>
> >>>     Hi Yonik:
> >>>
> >>>          Actually that is what I am looking for. Can you please point
> >>>     me to where/how sorting is done per-segment?
> >>>
> >>>          When heaving indexing introduces or modifies segments, would
> >>>     it cause reloading of FieldCache at query time and thus would
> >>>     impact search performance?
> >>>
> >>>     thanks
> >>>
> >>>     -John
> >>>
> >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
> >>>     <yonik@lucidimagination.com <mailto:yonik@lucidimagination.com>>
> >>>     wrote:
> >>>
> >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang <john.wang@gmail.com
> >>>     <mailto:john.wang@gmail.com>> wrote:
> >>>     > Looking at the code, seems there is a disconnect between
> >>>     how/when field
> >>>     > cache is loaded when IndexWriter.getReader() is called.
> >>>
> >>>     I'm not sure what you mean by "disconnect"
> >>>
> >>>     > Is FieldCache updated?
> >>>
> >>>     FieldCache entries are populated on demand, as they always have
> been.
> >>>
> >>>
> >>>     > Otherwise, are we reloading FieldCache for each
> >>>     > reader instance?
> >>>
> >>>     Searching/sorting is now per-segment, and so is the use of the
> >>>     FieldCache.  Segments that don't change shouldn't have to reload
> >>>     their
> >>>     FieldCache entries.
> >>>
> >>>     -Yonik
> >>>     http://www.lucidimagination.com
> >>>
> >>>
> ---------------------------------------------------------------------
> >>>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >>>     <mailto:java-dev-unsubscribe@lucene.apache.org>
> >>>     For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>>     <mailto:java-dev-help@lucene.apache.org>
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message