lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Document ID shuffling under 2.3.x (on merge?)
Date Tue, 11 Mar 2008 23:20:12 GMT

Daniel Noll wrote:

> On Tuesday 11 March 2008 19:55:39 Michael McCandless wrote:
>> Hi Daniel,
>> 2.3 should be no different from 2.2 in that docIDs only "shift" when
>> a merge of segments with deletions completes.
>> Could it be the ConcurrentMergeScheduler?  Merges now run in the
>> background by default and commit whenever they complete.  You can get
>> back to the previous (blocking) behavior by using
>> SerialMergeScheduler instead.
> That was my first thought, but SerialMergeScheduler doesn't cause  
> the problem.
> I've done a little more investigation since; it turns out that if I  
> don't
> call optimize() then the problem doesn't occur.

Oh, so you do not see the problem with SerialMergeScheduler but you  
do with ConcurrentMergeScheduler?

> Could it be that optimize(int,boolean) is storing the segments to  
> optimise in
> a HashSet, which by its nature reorders the segments?

Hmmm ... optimize does record which segments need to be merged away  
in a HashSet.  Then ConcurrentMergeScheduler will run the necessary  
merges (possibly several at once).  But the merges are still done on  
"contiguous" segments, and when committed the newly merged segment  
replaces that range of segments.  So I don't think this should be re- 
ordering documents.  Can you try running with infoStream set such  
that you get the problem to occur and then post the resulting output?

>> If it's not that ... can you provide more details about how your
>> applications is relying on docIDs?
> As far as that, we assume that if there are N documents in the  
> index then the
> next document ID will be N (we determine this before adding the  
> document.)
> As we're only doing this in a single thread and we never delete  
> documents,
> this was previously safe.

Oh, there are no deletions?  Then this is very strange.  Is it  
optimize that messes up the docIDs?  Or, is it when you add docs  
after having done an optimize that the newly added docs are messed up?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message