lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nigel <nigelspl...@gmail.com>
Subject Re: Will doc ids ever change if nothing is deleted?
Date Fri, 14 May 2010 17:45:52 GMT
We do assign GUIDs to everything in the index for cases where longer-term
identity is necessary.  For this case, using GUIDs would be prohibitively
expensive as we'd need to load the GUIDs for all search results.  We might
have tens of thousands of results and only want to load a random 100, so
having to load GUIDs for 10,000 results and then doing 100 GUID searches is
vastly less efficient than just collecting 10,000 doc ids and doing 100
doc(id) lookups.

Almost irresepective of the use case, I'm still interested to know whether
doc ids can be changed as a result of merging, optimization, etc. if no
documents are deleted.

Thanks,
Chris

On Fri, May 14, 2010 at 1:08 PM, Chris Harris <ryguasu@gmail.com> wrote:

> Could you address your needs by assigning each document a unique
> identifier (maybe you have a natural key, or maybe you could generate
> a new GUID or something for each doc), and using those identifiers,
> rather than internal Lucene docids, to track documents between the
> search stage and the loading stage?
>
> On Thu, May 13, 2010 at 7:12 PM, Nigel <nigelspleen@gmail.com> wrote:
> > Yes, I realize that storing document IDs persistently (for example) is a
> Bad
> > Idea. Partly I'm asking just to make sure I understand what's going on.
> >
> > There is a use case, though.  In some cases between when we do a search
> and
> > return some doc ids, and when we use those doc ids to load some
> documents,
> > the index could be reopened.  Normally the same IndexReader instance
> would
> > be used for both the searching and the loading and so you'd be assured
> that
> > the doc ids would be stable.  But sometimes when searches are distributed
> > across multiple remote indexes (and non-Lucene search systems) some
> > aggregation needs to occur before results are loaded, and references to
> the
> > IndexReaders can't be maintained across that process.  Currently we
> remember
> > the index version associated with a search result (i.e. a set of document
> > ids) so we can verify when loading the documents that the version is the
> > same, and therefore the IDs are still valid.  I'm wondering if this is
> > overly restrictive.  For example, if I knew that no documents had been
> > deleted, and if (per my original question) only deletions would trigger
> > renumbering, then the doc ids from a search result could be used on an
> index
> > with a newer version.
> >
> > Thanks,
> > Chris
> >
> > On Thu, May 13, 2010 at 9:51 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
> >
> >> Why do you care? That is, what do you want to accomplish
> >> that makes document ID renumbering relevant?
> >>
> >> In general, it is unwise to rely on Lucene-assigned document
> >> IDs. If you need an invariant document ID, assign it yourself.
> >>
> >> If this is off base, could you supply your use-case?
> >>
> >> Best
> >> Erick
> >>
> >> On Thu, May 13, 2010 at 9:38 PM, Nigel <nigelspleen@gmail.com> wrote:
> >>
> >> > The FAQ clearly states that document IDs will not be re-assigned
> unless
> >> > something was deleted.
> >> >
> >> >
> >>
> http://wiki.apache.org/lucene-java/LuceneFAQ#When_is_it_possible_for_document_IDs_to_change.3F
> >> >
> >> > However, a number of other emails and posts I've read mention that
> >> > renumbering occurs when segments are merged.  Maybe what people mean
> >> > is simply that when something is deleted, nothing is immediately
> >> > renumbered,
> >> > and it's not until merge time that the renumbering happens.  (This is
> my
> >> > understanding of how it works.)
> >> >
> >> > Just so I'm 100% clear, if I never delete anything, will the IDs ever
> >> > change?
> >> >
> >> > Thanks,
> >> > Chris
> >> >
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message