lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: 2.9 NRT w.r.t. sorting and field cache
Date Wed, 23 Sep 2009 00:33:56 GMT
Don't take me too seriously John - I doubt anyone does :)

And I wasn't implying Mike's time was more valuable than yours. I was
being ... uh ... me :)

And I don't claim that all of your many questions could have been found
in 5 seconds ;)

Just the ones you were asking - its very quick (at least with eclipse)
to see that there is no default impl.
Its also very quick to see that a segment reader is passed to the warm
method every time. I think its just
a generic IndexReader because you would warm a multi-reader the same way
as a segmentreader.

I was just suggesting you look at the code a bit, because I think its
fairly easy to figure out the basics of the warmer (hey, if I can do it
;) ).

Again, don't take me too seriously. I send out my comments faster than I
can think of them. And I've probably wasted more of Mike's time than anyone.

The only way you will load the entire FieldCache is to use a top level
Reader outside of the core API - the core api works per segment now. And
the IndexReaderWarmer is always passed a segmentreader from the readerPool.

- Mark

John Wang wrote:
> Mark:
>
> I did spend at least a quarter of an ounce. :) And I am sure Mike's
> time is more valuable than mine, but it was meant to be a "double-check"
>
> I was under the impression there is a default impl from previous email
> threads on how to handle field cache warming, perhaps I misunderstood.
>
> The real question here is "warms the reader" From a public API point
> of view, I wasn't sure if passing in a IndexReader impl is something
> we can do to avoid loading the entire field cache. e.g. would I need
> to down cast? can it be a filtered reader? etc.
>
> If you think there is something I could have done witin 5 sec, please
> point me to the right direction.
>
> Thanks
>
> -John
>
> On Wed, Sep 23, 2009 at 7:55 AM, Mark Miller <markrmiller@gmail.com
> <mailto:markrmiller@gmail.com>> wrote:
>
>     Come on dude :) Spend a half ounce of effort first. Mike's time is too
>     valuable !
>
>     Luckily mine is not.
>
>     There is no default impl - the class is dead simple (and the class has
>     been pointed out like 3 times in this thread - I'm not even fully
>     following and I know where to find it):
>
>      public static abstract class IndexReaderWarmer {
>        public abstract void warm(IndexReader reader) throws IOException;
>      }
>
>     Now pass something in that warms the reader. Load a fieldcache - do a
>     search. Do the hokey pokey and turn your self around ...
>
>     Investigation time: 5 seconds.
>
>     John Wang wrote:
>     > Hi Michael:
>     >
>     >      Thanks for the pointer!
>     >
>     >       Pardon my ignorance, but I am still no seeing the connection
>     > between this api to per/segment loading of FieldCache. (the api
>     takes
>     > in an IndexReader instead of maybe SegmentReader[])
>     >
>     >       Can you point me to maybe the default impl of
>     IndexReaderWarmer
>     > to help me understand?
>     >
>     > Thanks
>     >
>     > -John
>     >
>     > On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
>     > <lucene@mikemccandless.com <mailto:lucene@mikemccandless.com>
>     <mailto:lucene@mikemccandless.com
>     <mailto:lucene@mikemccandless.com>>> wrote:
>     >
>     >     This is exactly why we added
>     IndexWriter.setMergedSegmentWarmer -- you
>     >     can warm the reader w/o blocking ongoing updates.
>     >
>     >     Mike
>     >
>     >     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
>     >     <markrmiller@gmail.com <mailto:markrmiller@gmail.com>
>     <mailto:markrmiller@gmail.com <mailto:markrmiller@gmail.com>>> wrote:
>     >     > Right - when a large segment is invalidated, you will have
>     a bigger
>     >     > fieldcache piece to reload - pre 2.9, you'd be reloading
>     the *whole*
>     >     > field cache every time though. Sounds like you are trying to
>     >     deal with
>     >     > those large segments changing anyway :) They are always an
>     issue
>     >     when
>     >     > doing RT it seems.
>     >     >
>     >     > I don't believe deletes invalidate a field cache - terms from
>     >     deleted
>     >     > docs stay in a field cache and segmentreaders use their
>     >     freqStream as
>     >     > the fieldcache key. Only when the deletes are merged out
>     would they
>     >     > invalidate - but because your writing a new segment anyway ...
>     >     >
>     >     > - Mark
>     >     >
>     >     > John Wang wrote:
>     >     >> I understand what you are saying. Let me detail what I am
>     >     trying to say:
>     >     >>
>     >     >> When "currently processed segments" are flushed down,
>     merge may
>     >     >> happen. When merges happen, some of those "stable
>     segments" will be
>     >     >> invalidated, and so will the fieldcache data keyed by them.
>     >     >>
>     >     >> In a high update environment, such scenarios can happen quite
>     >     often.
>     >     >>
>     >     >> The way the default mergePolicy works is that small
>     segments get
>     >     >> merged into the larger segments. Eventually, what will be
>     >     invalidated
>     >     >> would be a large segment, and when that happens, a large
>     chunk
>     >     of the
>     >     >> field cache would be invalidated.
>     >     >>
>     >     >> Furthermore, in the case where there are high updates,
>     the stable
>     >     >> segments can be invalidate much sooner when there are deletes
>     >     in those
>     >     >> segments, and I would guess the corresponding FieldCache
>     needs
>     >     to be
>     >     >> adjusted. Not sure how it is handled right now.
>     >     >>
>     >     >> Just my two cents, and of course when I find the time I will
>     >     need to
>     >     >> run some tests to see.
>     >     >>
>     >     >> -John
>     >     >>
>     >     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler
>     <uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     >     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>
>     >     >> <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>>> wrote:
>     >     >>
>     >     >>     The NRT reader coming from the
>     IndexWriter.getReader() has only
>     >     >>     changes in the currently processed segments, the
>     other segments
>     >     >>     keep stable (and even their IndexReader keys used for the
>     >     >>     FieldCache). The rest of the segments keep stable.
>     For the
>     >     >>     consumer it looks like a normal reader (it is in fact a
>     >     >>     ReadOnlyDirectoryReader) supporting
>     >     getSequentialSubReaders() and
>     >     >>     so on.
>     >     >>
>     >     >>
>     >     >>
>     >     >>     -----
>     >     >>     Uwe Schindler
>     >     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     >     >>     http://www.thetaphi.de
>     >     >>     eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>
>     >     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>>
>     >     >>
>     >     >>
>     >    
>     ------------------------------------------------------------------------
>     >     >>
>     >     >>     *From:* John Wang [mailto:john.wang@gmail.com
>     <mailto:john.wang@gmail.com>
>     >     <mailto:john.wang@gmail.com <mailto:john.wang@gmail.com>>
>     >     >>     <mailto:john.wang@gmail.com
>     <mailto:john.wang@gmail.com> <mailto:john.wang@gmail.com
>     <mailto:john.wang@gmail.com>>>]
>     >     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
>     >     >>     *To:* java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org>
>     >     <mailto:java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org>>
>     >     <mailto:java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org>
>     >     <mailto:java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org>>>
>     >     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
>     >     >>
>     >     >>
>     >     >>
>     >     >>     Thanks Mark for the pointer!
>     >     >>
>     >     >>     I guess my point is with NRT, and when segment files
>     change
>     >     often,
>     >     >>     this would be an issue, no?
>     >     >>
>     >     >>     Anyway, I can run some tests.
>     >     >>
>     >     >>     Thanks
>     >     >>
>     >     >>     -John
>     >     >>
>     >     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
>     >     >>     <markrmiller@gmail.com <mailto:markrmiller@gmail.com>
>     <mailto:markrmiller@gmail.com <mailto:markrmiller@gmail.com>>
>     >     <mailto:markrmiller@gmail.com <mailto:markrmiller@gmail.com>
>     <mailto:markrmiller@gmail.com <mailto:markrmiller@gmail.com>>>>
wrote:
>     >     >>
>     >     >>     1483 - indexsearcher pulls out a readers subreaders
>     >     >>     (segmentreaders) and sends a collector over them one
>     by one,
>     >     >>     rather than using the multireader. So only fc for seg
>     >     readers that
>     >     >>     change need to be reloaded.
>     >     >>
>     >     >>     - Mark
>     >     >>
>     >     >>
>     >     >>
>     >     >>     http://www.lucidimagination.com (mobile)
>     >     >>
>     >     >>
>     >     >>     On Sep 22, 2009, at 1:27 AM, John Wang
>     <john.wang@gmail.com <mailto:john.wang@gmail.com>
>     >     <mailto:john.wang@gmail.com <mailto:john.wang@gmail.com>>
>     >     >>     <mailto:john.wang@gmail.com
>     <mailto:john.wang@gmail.com> <mailto:john.wang@gmail.com
>     <mailto:john.wang@gmail.com>>>>
>     >     wrote:
>     >     >>
>     >     >>>     Hi Yonik:
>     >     >>>
>     >     >>>          Actually that is what I am looking for. Can you
>     >     please point
>     >     >>>     me to where/how sorting is done per-segment?
>     >     >>>
>     >     >>>          When heaving indexing introduces or modifies
>     >     segments, would
>     >     >>>     it cause reloading of FieldCache at query time and
>     thus would
>     >     >>>     impact search performance?
>     >     >>>
>     >     >>>     thanks
>     >     >>>
>     >     >>>     -John
>     >     >>>
>     >     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
>     >     >>>     <yonik@lucidimagination.com
>     <mailto:yonik@lucidimagination.com>
>     >     <mailto:yonik@lucidimagination.com
>     <mailto:yonik@lucidimagination.com>>
>     >     <mailto:yonik@lucidimagination.com
>     <mailto:yonik@lucidimagination.com>
>     >     <mailto:yonik@lucidimagination.com
>     <mailto:yonik@lucidimagination.com>>>>
>     >     >>>     wrote:
>     >     >>>
>     >     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
>     >     <john.wang@gmail.com <mailto:john.wang@gmail.com>
>     <mailto:john.wang@gmail.com <mailto:john.wang@gmail.com>>
>     >     >>>     <mailto:john.wang@gmail.com
>     <mailto:john.wang@gmail.com> <mailto:john.wang@gmail.com
>     <mailto:john.wang@gmail.com>>>>
>     >     wrote:
>     >     >>>     > Looking at the code, seems there is a disconnect
>     between
>     >     >>>     how/when field
>     >     >>>     > cache is loaded when IndexWriter.getReader() is
>     called.
>     >     >>>
>     >     >>>     I'm not sure what you mean by "disconnect"
>     >     >>>
>     >     >>>     > Is FieldCache updated?
>     >     >>>
>     >     >>>     FieldCache entries are populated on demand, as they
>     always
>     >     have been.
>     >     >>>
>     >     >>>
>     >     >>>     > Otherwise, are we reloading FieldCache for each
>     >     >>>     > reader instance?
>     >     >>>
>     >     >>>     Searching/sorting is now per-segment, and so is the
>     use of the
>     >     >>>     FieldCache.  Segments that don't change shouldn't
>     have to
>     >     reload
>     >     >>>     their
>     >     >>>     FieldCache entries.
>     >     >>>
>     >     >>>     -Yonik
>     >     >>>     http://www.lucidimagination.com
>     >     >>>
>     >     >>>
>     >    
>     ---------------------------------------------------------------------
>     >     >>>     To unsubscribe, e-mail:
>     >     java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>>
>     >     >>>     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>>>
>     >     >>>     For additional commands, e-mail:
>     >     java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>>
>     >     >>>     <mailto:java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>>>
>     >     >>>
>     >     >>>
>     >     >>>
>     >     >>
>     >     >>
>     >     >>
>     >     >
>     >     >
>     >     > --
>     >     > - Mark
>     >     >
>     >     > http://www.lucidimagination.com
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >    
>     ---------------------------------------------------------------------
>     >     > To unsubscribe, e-mail:
>     java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>>
>     >     > For additional commands, e-mail:
>     java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>>
>     >     >
>     >     >
>     >
>     >    
>     ---------------------------------------------------------------------
>     >     To unsubscribe, e-mail:
>     java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>     >     <mailto:java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>>
>     >     For additional commands, e-mail:
>     java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>
>     >     <mailto:java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>>
>     >
>     >
>
>
>     --
>     - Mark
>
>     http://www.lucidimagination.com
>
>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>
>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message