lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Harris <rygu...@gmail.com>
Subject Re: Will doc ids ever change if nothing is deleted?
Date Fri, 14 May 2010 17:08:53 GMT
Could you address your needs by assigning each document a unique
identifier (maybe you have a natural key, or maybe you could generate
a new GUID or something for each doc), and using those identifiers,
rather than internal Lucene docids, to track documents between the
search stage and the loading stage?

On Thu, May 13, 2010 at 7:12 PM, Nigel <nigelspleen@gmail.com> wrote:
> Yes, I realize that storing document IDs persistently (for example) is a Bad
> Idea. Partly I'm asking just to make sure I understand what's going on.
>
> There is a use case, though.  In some cases between when we do a search and
> return some doc ids, and when we use those doc ids to load some documents,
> the index could be reopened.  Normally the same IndexReader instance would
> be used for both the searching and the loading and so you'd be assured that
> the doc ids would be stable.  But sometimes when searches are distributed
> across multiple remote indexes (and non-Lucene search systems) some
> aggregation needs to occur before results are loaded, and references to the
> IndexReaders can't be maintained across that process.  Currently we remember
> the index version associated with a search result (i.e. a set of document
> ids) so we can verify when loading the documents that the version is the
> same, and therefore the IDs are still valid.  I'm wondering if this is
> overly restrictive.  For example, if I knew that no documents had been
> deleted, and if (per my original question) only deletions would trigger
> renumbering, then the doc ids from a search result could be used on an index
> with a newer version.
>
> Thanks,
> Chris
>
> On Thu, May 13, 2010 at 9:51 PM, Erick Erickson <erickerickson@gmail.com>wrote:
>
>> Why do you care? That is, what do you want to accomplish
>> that makes document ID renumbering relevant?
>>
>> In general, it is unwise to rely on Lucene-assigned document
>> IDs. If you need an invariant document ID, assign it yourself.
>>
>> If this is off base, could you supply your use-case?
>>
>> Best
>> Erick
>>
>> On Thu, May 13, 2010 at 9:38 PM, Nigel <nigelspleen@gmail.com> wrote:
>>
>> > The FAQ clearly states that document IDs will not be re-assigned unless
>> > something was deleted.
>> >
>> >
>> http://wiki.apache.org/lucene-java/LuceneFAQ#When_is_it_possible_for_document_IDs_to_change.3F
>> >
>> > However, a number of other emails and posts I've read mention that
>> > renumbering occurs when segments are merged.  Maybe what people mean
>> > is simply that when something is deleted, nothing is immediately
>> > renumbered,
>> > and it's not until merge time that the renumbering happens.  (This is my
>> > understanding of how it works.)
>> >
>> > Just so I'm 100% clear, if I never delete anything, will the IDs ever
>> > change?
>> >
>> > Thanks,
>> > Chris
>> >
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message