lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: building custom cache - using lucene docids
Date Sun, 24 Nov 2013 13:31:36 GMT
bq: Do i understand you correctly that when two segmets get merged, the
docids
(of the original segments) remain the same?

The original segments are unchanged, segments are _never_ changed after
they're closed. But they'll be thrown away. Say you have segment1 and
segment2 that get merged into segment3. As soon as the last searcher
that is looking at segment1 and segment2 is closed, those two segments
will be deleted from your disk.

But for any given doc, the docid in segment3 will very likely be different
than it was in segment1 or 2.

I think you're reading too much into LUCENE-2897. I'm pretty sure the
segment in question is not available to you anyway before this rewrite is
done,
but freely admit I don't know much about it.

You're probably going to get into the whole PerSegment family of operations,
which is something I'm not all that familiar with so I'll leave
explanations
to others.


On Sat, Nov 23, 2013 at 8:22 PM, Roman Chyla <roman.chyla@gmail.com> wrote:

> Hi Erick,
> Many thanks for the info. An additional question:
>
> Do i understand you correctly that when two segmets get merged, the docids
> (of the original segments) remain the same?
>
> (unless, perhaps in situation, they were merged using the last index
> segment which was opened for writing and where the docids could have
> suddenly changed in a commit just before the merge)
>
> Yes, you guessed right that I am putting my code into the custom cache - so
> it gets notified on index changes. I don't know yet how, but I think I can
> find the way to the current active, opened (last) index segment. Which is
> actively updated (as opposed to just being merged) -- so my definition of
> 'not last ones' is: where docids don't change. I'd be grateful if someone
> could spot any problem with such assumption.
>
> roman
>
>
>
>
> On Sat, Nov 23, 2013 at 7:39 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > bq: But can I assume
> > that docids in other segments (other than the last one) will be
> relatively
> > stable?
> >
> > Kinda. Maybe. Maybe not. It depends on how you define "other than the
> > last one".
> >
> > The key is that the internal doc IDs may change when segments are
> > merged. And old segments get merged. Doc IDs will _never_ change
> > in a segment once it's closed (although as you note they may be
> > marked as deleted). But that segment may be written to a new segment
> > when merging and the internal ID for a given document in the new
> > segment bears no relationship to internal ID in the old segment.
> >
> > BTW, I think you only really care when opening a new searchers. There is
> > a UserCache (see solrconfig.xml) that gets notified when a new searcher
> > is being opened to give it an opportunity to refresh itself, is that
> > useful?
> >
> > As long as a searcher is open, it's guaranteed that nothing is changing.
> > Hard commits with openSearcher=false don't open new searchers, which
> > is why changes aren't visible until a softCommit or a hard commit with
> > openSearcher=true despite the fact that the segments are closed.
> >
> > FWIW,
> > Erick
> >
> > Best
> > Erick
> >
> >
> >
> > On Sat, Nov 23, 2013 at 12:40 AM, Roman Chyla <roman.chyla@gmail.com>
> > wrote:
> >
> > > Hi,
> > > docids are 'ephemeral', but i'd still like to build a search cache with
> > > them (they allow for the fastest joins).
> > >
> > > i'm seeing docids keep changing with updates (especially, in the last
> > index
> > > segment) - as per
> > > https://issues.apache.org/jira/browse/LUCENE-2897
> > >
> > > That would be fine, because i could build the cache from diff (of index
> > > state) + reading the latest index segment in its entirety. But can I
> > assume
> > > that docids in other segments (other than the last one) will be
> > relatively
> > > stable? (ie. when an old doc is deleted, the docid is marked as
> removed;
> > > update doc = delete old & create a new docid)?
> > >
> > > thanks
> > >
> > > roman
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message