lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renaud Delbru <renaud.del...@deri.org>
Subject Re: Flex API - Debugging Segment Merge
Date Thu, 25 Mar 2010 19:04:16 GMT
Hi Michael,

On 25/03/10 18:45, Michael McCandless wrote:
> Hi Renaud,
>
> It's great that you're pushing flex forward so much :) You're making
> some cool sounding codecs!  I'm really looking forward to seeing
> indexing/searching performance results on Wikipedia...
>    
I'll share them for sure whenever the results are ready ;o).
> It sounds most likely there's a bug in the PFor impl? (Since you don't
> hit this exception with the others...).
>    
It seems so, but I found strange also that I cannot reproduce it with 
synthetic data.
> During merge, each segment's docIDs are rebased according to how many
> non-deleted docs there are in all prior segments.  One possibility
> here is a given segment thought it had N deletions but in fact
> encountered fewer than N while iterating its docs.  This would cause
> the next segment to have too-low a base which can cause this exact
> exception on crossing from one segment to the next.  (Ie the very
> first doc of the next segment will suddenly be<= prior doc(s)).
>
> But... if that's happening (ie, bug is in Lucene not in PFor impl),
> you'd expect the other codecs to hit it too.
>
> Are you using multiple threads for indexing?  Are you also mixing in
> deletions (or updateDocument calls)?
>    
There is no deletion, I just create the index from scratch, and each 
document I am adding as a unique identifier.
I am using one single thread for indexing: reading sequentially the list 
of wikipedia articles, putting the content into a single field, and add 
the document to the index. Commit is done every 10K documents.
I have tried with different mergeFactors (2, or 20), but whenever the 
first merge occurs, I got this CorruptIndexException.

I will try to continue to debug, but if I could have at least the faulty 
segment, and the faulty term (or even better, the index of the faulty 
block), I will be able to display the content of the blocks, and see if 
there is some problems in the PFor encoding.

Cheers,
-- 
Renaud Delbru

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message