lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Document ID shuffling under 2.3.x (on merge?)
Date Wed, 12 Mar 2008 08:36:57 GMT

Daniel Noll wrote:

> I have filtered out lines in the log which indicated an exception  
> adding the
> document; these occur when our Reader throws an IOException and  
> there were so
> many that it bloated the file.

OK, I think very likely this is the issue: when IndexWriter hits an  
exception while processing a document, the portion of the document  
already indexed is left in the index, and then its docID is marked  
for deletion.  You can see these deletions in your infoStream:

   flush 0 buffered deleted terms and 30 deleted docIDs on 20 segments

This means you have deletions in your index, by docID, and so when  
you optimize the docIDs are then compacted.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message