lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Will doc ids ever change if nothing is deleted?
Date Tue, 18 May 2010 09:41:11 GMT
If you never delete docs, then w/ the default merge policy, the docIDs
should never change.

But... this should be considered an impl detail of Lucene.  In theory
someday this could change.  EG there's an issue open (LUCENE-1076) to
allow a merge policy to select out-of-order merges, which they cannot
do today.

Mike

On Fri, May 14, 2010 at 4:20 PM, Nigel <nigelspleen@gmail.com> wrote:
> Right, but my question was whether merging segments will renumber docs *if
> no documents are deleted*.  Empirically, the answer is no.  I've written
> test code that indexes documents with a field equal to each document's
> current id, and verified that the ids still match the field values even
> after merges and optimizes are done, if no docs are deleted.  This seems to
> indicate that merging always preserves the current id order, which
> is perhaps the obvious way to implement merging.  (Though of course an
> alternative reasonable implementation, for example. might have been to add
> everything from one segment to the merged segment, then add everything from
> the next segment, etc., thus renumbering everything.)
>
> Given all the misinformation or ambiguous information on this topic that
> I've found, I was hoping to clear this up definitively.  My test case
> doesn't prove that there's not some way for ids to change in certain
> circumstances (apart from deletions, which is documented clearly).  But
> right now I don't see any reason not to believe the FAQ entry mentioned in
> my original email.
>
> Thanks,
> Chris
>
> On Fri, May 14, 2010 at 2:04 PM, Chris Lu <chris.lu@gmail.com> wrote:
>
>> The doc id will get changed if the segments are merged. The doc id is more
>> depending on the order of documents being added.
>>
>> Just think about it. The doc ids are starting from 0 to N. And when some
>> documents are deleted, they are marked deleted on .del file. So no change
>> there. When some documents are added, the id is N+1.
>>
>> Of course, if some documents from other segments are merged. The documents
>> in one segment will "lose" its doc id.
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>>
>> On 05/13/2010 06:38 PM, Nigel wrote:
>>
>>> The FAQ clearly states that document IDs will not be re-assigned unless
>>> something was deleted.
>>>
>>> http://wiki.apache.org/lucene-java/LuceneFAQ#When_is_it_possible_for_document_IDs_to_change.3F
>>>
>>> However, a number of other emails and posts I've read mention that
>>> renumbering occurs when segments are merged.  Maybe what people mean
>>> is simply that when something is deleted, nothing is immediately
>>> renumbered,
>>> and it's not until merge time that the renumbering happens.  (This is my
>>> understanding of how it works.)
>>>
>>> Just so I'm 100% clear, if I never delete anything, will the IDs ever
>>> change?
>>>
>>> Thanks,
>>> Chris
>>>
>>>
>>>
>>
>>
>>  ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message