lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <dan...@nuix.com>
Subject Re: Document ID shuffling under 2.3.x (on merge?)
Date Wed, 12 Mar 2008 01:29:37 GMT
On Wednesday 12 March 2008 10:20:12 Michael McCandless wrote:
> Oh, so you do not see the problem with SerialMergeScheduler but you
> do with ConcurrentMergeScheduler?
[...]
> Oh, there are no deletions?  Then this is very strange.  Is it  
> optimize that messes up the docIDs?  Or, is it when you add docs  
> after having done an optimize that the newly added docs are messed up?

Ah, my bad.  It happens with both merge schedulers actually, but not during 
normal merges during the indexing, only with optimize().  Also, we're not 
adding docs after calling optimize either.  We're adding them all and merging 
along the way, and then calling optimize() once at the end.  If I comment out 
that one call to optimize() the problem seems to go away entirely.  Although 
to be honest it was happening once before and it looked like it went away, 
and we only just discovered it had returned.

> Hmmm ... optimize does record which segments need to be merged away
> in a HashSet.  Then ConcurrentMergeScheduler will run the necessary
> merges (possibly several at once).  But the merges are still done on
> "contiguous" segments, and when committed the newly merged segment
> replaces that range of segments.  So I don't think this should be re-
> ordering documents.  Can you try running with infoStream set such
> that you get the problem to occur and then post the resulting output?

Attached.

I have filtered out lines in the log which indicated an exception adding the 
document; these occur when our Reader throws an IOException and there were so 
many that it bloated the file.

Daniel

Mime
View raw message