lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <john.w...@gmail.com>
Subject Re: 2.9 NRT w.r.t. sorting and field cache
Date Wed, 23 Sep 2009 00:39:43 GMT
No worries.
Just trying to understand things.

I wanted to double check but didn't want to write "My IDE told me that was
the case" to sound pissy.

I did look at the code, sometimes too much actually, but I never want to
claim I understand the code 100%, hence going to the source is probably the
best, even at the expense of sounding dumb, it is usually worthy it ;)

My question is more on how would a person do it on the public API level
without having to hack into the source code.

My main misunderstanding at this point is that I had thought
IndexReaderWarmer can directly warm the field cache deterministically.

Thanks

-John

On Wed, Sep 23, 2009 at 8:33 AM, Mark Miller <markrmiller@gmail.com> wrote:

> Don't take me too seriously John - I doubt anyone does :)
>
> And I wasn't implying Mike's time was more valuable than yours. I was
> being ... uh ... me :)
>
> And I don't claim that all of your many questions could have been found
> in 5 seconds ;)
>
> Just the ones you were asking - its very quick (at least with eclipse)
> to see that there is no default impl.
> Its also very quick to see that a segment reader is passed to the warm
> method every time. I think its just
> a generic IndexReader because you would warm a multi-reader the same way
> as a segmentreader.
>
> I was just suggesting you look at the code a bit, because I think its
> fairly easy to figure out the basics of the warmer (hey, if I can do it
> ;) ).
>
> Again, don't take me too seriously. I send out my comments faster than I
> can think of them. And I've probably wasted more of Mike's time than
> anyone.
>
> The only way you will load the entire FieldCache is to use a top level
> Reader outside of the core API - the core api works per segment now. And
> the IndexReaderWarmer is always passed a segmentreader from the readerPool.
>
> - Mark
>
> John Wang wrote:
> > Mark:
> >
> > I did spend at least a quarter of an ounce. :) And I am sure Mike's
> > time is more valuable than mine, but it was meant to be a "double-check"
> >
> > I was under the impression there is a default impl from previous email
> > threads on how to handle field cache warming, perhaps I misunderstood.
> >
> > The real question here is "warms the reader" From a public API point
> > of view, I wasn't sure if passing in a IndexReader impl is something
> > we can do to avoid loading the entire field cache. e.g. would I need
> > to down cast? can it be a filtered reader? etc.
> >
> > If you think there is something I could have done witin 5 sec, please
> > point me to the right direction.
> >
> > Thanks
> >
> > -John
> >
> > On Wed, Sep 23, 2009 at 7:55 AM, Mark Miller <markrmiller@gmail.com
> > <mailto:markrmiller@gmail.com>> wrote:
> >
> >     Come on dude :) Spend a half ounce of effort first. Mike's time is
> too
> >     valuable !
> >
> >     Luckily mine is not.
> >
> >     There is no default impl - the class is dead simple (and the class
> has
> >     been pointed out like 3 times in this thread - I'm not even fully
> >     following and I know where to find it):
> >
> >      public static abstract class IndexReaderWarmer {
> >        public abstract void warm(IndexReader reader) throws IOException;
> >      }
> >
> >     Now pass something in that warms the reader. Load a fieldcache - do a
> >     search. Do the hokey pokey and turn your self around ...
> >
> >     Investigation time: 5 seconds.
> >
> >     John Wang wrote:
> >     > Hi Michael:
> >     >
> >     >      Thanks for the pointer!
> >     >
> >     >       Pardon my ignorance, but I am still no seeing the connection
> >     > between this api to per/segment loading of FieldCache. (the api
> >     takes
> >     > in an IndexReader instead of maybe SegmentReader[])
> >     >
> >     >       Can you point me to maybe the default impl of
> >     IndexReaderWarmer
> >     > to help me understand?
> >     >
> >     > Thanks
> >     >
> >     > -John
> >     >
> >     > On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
> >     > <lucene@mikemccandless.com <mailto:lucene@mikemccandless.com>
> >     <mailto:lucene@mikemccandless.com
> >     <mailto:lucene@mikemccandless.com>>> wrote:
> >     >
> >     >     This is exactly why we added
> >     IndexWriter.setMergedSegmentWarmer -- you
> >     >     can warm the reader w/o blocking ongoing updates.
> >     >
> >     >     Mike
> >     >
> >     >     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
> >     >     <markrmiller@gmail.com <mailto:markrmiller@gmail.com>
> >     <mailto:markrmiller@gmail.com <mailto:markrmiller@gmail.com>>>
> wrote:
> >     >     > Right - when a large segment is invalidated, you will have
> >     a bigger
> >     >     > fieldcache piece to reload - pre 2.9, you'd be reloading
> >     the *whole*
> >     >     > field cache every time though. Sounds like you are trying to
> >     >     deal with
> >     >     > those large segments changing anyway :) They are always an
> >     issue
> >     >     when
> >     >     > doing RT it seems.
> >     >     >
> >     >     > I don't believe deletes invalidate a field cache - terms from
> >     >     deleted
> >     >     > docs stay in a field cache and segmentreaders use their
> >     >     freqStream as
> >     >     > the fieldcache key. Only when the deletes are merged out
> >     would they
> >     >     > invalidate - but because your writing a new segment anyway
> ...
> >     >     >
> >     >     > - Mark
> >     >     >
> >     >     > John Wang wrote:
> >     >     >> I understand what you are saying. Let me detail what I am
> >     >     trying to say:
> >     >     >>
> >     >     >> When "currently processed segments" are flushed down,
> >     merge may
> >     >     >> happen. When merges happen, some of those "stable
> >     segments" will be
> >     >     >> invalidated, and so will the fieldcache data keyed by them.
> >     >     >>
> >     >     >> In a high update environment, such scenarios can happen
> quite
> >     >     often.
> >     >     >>
> >     >     >> The way the default mergePolicy works is that small
> >     segments get
> >     >     >> merged into the larger segments. Eventually, what will be
> >     >     invalidated
> >     >     >> would be a large segment, and when that happens, a large
> >     chunk
> >     >     of the
> >     >     >> field cache would be invalidated.
> >     >     >>
> >     >     >> Furthermore, in the case where there are high updates,
> >     the stable
> >     >     >> segments can be invalidate much sooner when there are
> deletes
> >     >     in those
> >     >     >> segments, and I would guess the corresponding FieldCache
> >     needs
> >     >     to be
> >     >     >> adjusted. Not sure how it is handled right now.
> >     >     >>
> >     >     >> Just my two cents, and of course when I find the time I will
> >     >     need to
> >     >     >> run some tests to see.
> >     >     >>
> >     >     >> -John
> >     >     >>
> >     >     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler
> >     <uwe@thetaphi.de <mailto:uwe@thetaphi.de>
> >     >     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>
> >     >     >> <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>>> wrote:
> >     >     >>
> >     >     >>     The NRT reader coming from the
> >     IndexWriter.getReader() has only
> >     >     >>     changes in the currently processed segments, the
> >     other segments
> >     >     >>     keep stable (and even their IndexReader keys used for
> the
> >     >     >>     FieldCache). The rest of the segments keep stable.
> >     For the
> >     >     >>     consumer it looks like a normal reader (it is in fact
a
> >     >     >>     ReadOnlyDirectoryReader) supporting
> >     >     getSequentialSubReaders() and
> >     >     >>     so on.
> >     >     >>
> >     >     >>
> >     >     >>
> >     >     >>     -----
> >     >     >>     Uwe Schindler
> >     >     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
> >     >     >>     http://www.thetaphi.de
> >     >     >>     eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>
> >     >     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>>
> >     >     >>
> >     >     >>
> >     >
> >
> ------------------------------------------------------------------------
> >     >     >>
> >     >     >>     *From:* John Wang [mailto:john.wang@gmail.com
> >     <mailto:john.wang@gmail.com>
> >     >     <mailto:john.wang@gmail.com <mailto:john.wang@gmail.com>>
> >     >     >>     <mailto:john.wang@gmail.com
> >     <mailto:john.wang@gmail.com> <mailto:john.wang@gmail.com
> >     <mailto:john.wang@gmail.com>>>]
> >     >     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
> >     >     >>     *To:* java-dev@lucene.apache.org
> >     <mailto:java-dev@lucene.apache.org>
> >     >     <mailto:java-dev@lucene.apache.org
> >     <mailto:java-dev@lucene.apache.org>>
> >     >     <mailto:java-dev@lucene.apache.org
> >     <mailto:java-dev@lucene.apache.org>
> >     >     <mailto:java-dev@lucene.apache.org
> >     <mailto:java-dev@lucene.apache.org>>>
> >     >     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
> >     >     >>
> >     >     >>
> >     >     >>
> >     >     >>     Thanks Mark for the pointer!
> >     >     >>
> >     >     >>     I guess my point is with NRT, and when segment files
> >     change
> >     >     often,
> >     >     >>     this would be an issue, no?
> >     >     >>
> >     >     >>     Anyway, I can run some tests.
> >     >     >>
> >     >     >>     Thanks
> >     >     >>
> >     >     >>     -John
> >     >     >>
> >     >     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
> >     >     >>     <markrmiller@gmail.com <mailto:markrmiller@gmail.com>
> >     <mailto:markrmiller@gmail.com <mailto:markrmiller@gmail.com>>
> >     >     <mailto:markrmiller@gmail.com <mailto:markrmiller@gmail.com>
> >     <mailto:markrmiller@gmail.com <mailto:markrmiller@gmail.com>>>>
> wrote:
> >     >     >>
> >     >     >>     1483 - indexsearcher pulls out a readers subreaders
> >     >     >>     (segmentreaders) and sends a collector over them one
> >     by one,
> >     >     >>     rather than using the multireader. So only fc for seg
> >     >     readers that
> >     >     >>     change need to be reloaded.
> >     >     >>
> >     >     >>     - Mark
> >     >     >>
> >     >     >>
> >     >     >>
> >     >     >>     http://www.lucidimagination.com (mobile)
> >     >     >>
> >     >     >>
> >     >     >>     On Sep 22, 2009, at 1:27 AM, John Wang
> >     <john.wang@gmail.com <mailto:john.wang@gmail.com>
> >     >     <mailto:john.wang@gmail.com <mailto:john.wang@gmail.com>>
> >     >     >>     <mailto:john.wang@gmail.com
> >     <mailto:john.wang@gmail.com> <mailto:john.wang@gmail.com
> >     <mailto:john.wang@gmail.com>>>>
> >     >     wrote:
> >     >     >>
> >     >     >>>     Hi Yonik:
> >     >     >>>
> >     >     >>>          Actually that is what I am looking for. Can you
> >     >     please point
> >     >     >>>     me to where/how sorting is done per-segment?
> >     >     >>>
> >     >     >>>          When heaving indexing introduces or modifies
> >     >     segments, would
> >     >     >>>     it cause reloading of FieldCache at query time and
> >     thus would
> >     >     >>>     impact search performance?
> >     >     >>>
> >     >     >>>     thanks
> >     >     >>>
> >     >     >>>     -John
> >     >     >>>
> >     >     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
> >     >     >>>     <yonik@lucidimagination.com
> >     <mailto:yonik@lucidimagination.com>
> >     >     <mailto:yonik@lucidimagination.com
> >     <mailto:yonik@lucidimagination.com>>
> >     >     <mailto:yonik@lucidimagination.com
> >     <mailto:yonik@lucidimagination.com>
> >     >     <mailto:yonik@lucidimagination.com
> >     <mailto:yonik@lucidimagination.com>>>>
> >     >     >>>     wrote:
> >     >     >>>
> >     >     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
> >     >     <john.wang@gmail.com <mailto:john.wang@gmail.com>
> >     <mailto:john.wang@gmail.com <mailto:john.wang@gmail.com>>
> >     >     >>>     <mailto:john.wang@gmail.com
> >     <mailto:john.wang@gmail.com> <mailto:john.wang@gmail.com
> >     <mailto:john.wang@gmail.com>>>>
> >     >     wrote:
> >     >     >>>     > Looking at the code, seems there is a disconnect
> >     between
> >     >     >>>     how/when field
> >     >     >>>     > cache is loaded when IndexWriter.getReader()
is
> >     called.
> >     >     >>>
> >     >     >>>     I'm not sure what you mean by "disconnect"
> >     >     >>>
> >     >     >>>     > Is FieldCache updated?
> >     >     >>>
> >     >     >>>     FieldCache entries are populated on demand, as they
> >     always
> >     >     have been.
> >     >     >>>
> >     >     >>>
> >     >     >>>     > Otherwise, are we reloading FieldCache for each
> >     >     >>>     > reader instance?
> >     >     >>>
> >     >     >>>     Searching/sorting is now per-segment, and so is the
> >     use of the
> >     >     >>>     FieldCache.  Segments that don't change shouldn't
> >     have to
> >     >     reload
> >     >     >>>     their
> >     >     >>>     FieldCache entries.
> >     >     >>>
> >     >     >>>     -Yonik
> >     >     >>>     http://www.lucidimagination.com
> >     >     >>>
> >     >     >>>
> >     >
> >     ---------------------------------------------------------------------
> >     >     >>>     To unsubscribe, e-mail:
> >     >     java-dev-unsubscribe@lucene.apache.org
> >     <mailto:java-dev-unsubscribe@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <mailto:java-dev-unsubscribe@lucene.apache.org>>
> >     >     >>>     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <mailto:java-dev-unsubscribe@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <mailto:java-dev-unsubscribe@lucene.apache.org>>>
> >     >     >>>     For additional commands, e-mail:
> >     >     java-dev-help@lucene.apache.org
> >     <mailto:java-dev-help@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <mailto:java-dev-help@lucene.apache.org>>
> >     >     >>>     <mailto:java-dev-help@lucene.apache.org
> >     <mailto:java-dev-help@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <mailto:java-dev-help@lucene.apache.org>>>
> >     >     >>>
> >     >     >>>
> >     >     >>>
> >     >     >>
> >     >     >>
> >     >     >>
> >     >     >
> >     >     >
> >     >     > --
> >     >     > - Mark
> >     >     >
> >     >     > http://www.lucidimagination.com
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >
> >     ---------------------------------------------------------------------
> >     >     > To unsubscribe, e-mail:
> >     java-dev-unsubscribe@lucene.apache.org
> >     <mailto:java-dev-unsubscribe@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <mailto:java-dev-unsubscribe@lucene.apache.org>>
> >     >     > For additional commands, e-mail:
> >     java-dev-help@lucene.apache.org
> >     <mailto:java-dev-help@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <mailto:java-dev-help@lucene.apache.org>>
> >     >     >
> >     >     >
> >     >
> >     >
> >     ---------------------------------------------------------------------
> >     >     To unsubscribe, e-mail:
> >     java-dev-unsubscribe@lucene.apache.org
> >     <mailto:java-dev-unsubscribe@lucene.apache.org>
> >     >     <mailto:java-dev-unsubscribe@lucene.apache.org
> >     <mailto:java-dev-unsubscribe@lucene.apache.org>>
> >     >     For additional commands, e-mail:
> >     java-dev-help@lucene.apache.org
> >     <mailto:java-dev-help@lucene.apache.org>
> >     >     <mailto:java-dev-help@lucene.apache.org
> >     <mailto:java-dev-help@lucene.apache.org>>
> >     >
> >     >
> >
> >
> >     --
> >     - Mark
> >
> >     http://www.lucidimagination.com
> >
> >
> >
> >
> >     ---------------------------------------------------------------------
> >     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >     <mailto:java-dev-unsubscribe@lucene.apache.org>
> >     For additional commands, e-mail: java-dev-help@lucene.apache.org
> >     <mailto:java-dev-help@lucene.apache.org>
> >
> >
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message