lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renaud Delbru <renaud.del...@deri.org>
Subject Flex API - Debugging Segment Merge
Date Thu, 25 Mar 2010 16:55:07 GMT
Hi,

I am currently benchmarking various compression algorithms using the Sep 
Codec, but I got index corruption exception during the merge process, 
and I would need your help to debug it.

I have reimplemented various algorithms like FOR, Simple9, VInt, PFor 
for the Sep IntBlock Codec. I am benchmarking them now on the wikipedia 
dataset. For some algorithms, FOR, Simple9, etc., I don't encounter 
problems. But using the PFor algorithms, I get a CorruptedIndex 
exception during the merge process (in SepPostingsWriterImpl#startDoc), 
because document are out of order:

Exception in thread "Lucene Merge Thread #0" 
org.apache.lucene.index.MergePolicy$MergeException: 
org.apache.lucene.index.CorruptIndexException: docs out of order (153 <= 
153 )
         at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:471)
         at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:435)
Caused by: org.apache.lucene.index.CorruptIndexException: docs out of 
order (153 <= 153 )
         at 
org.apache.lucene.index.codecs.sep.SepPostingsWriterImpl.startDoc(SepPostingsWriterImpl.java:177)

However, this is happening only when I tried to index the wikipedia 
dataset using the PFor algorithm. I have tried to recreate the error 
using a unit test, creating random document, and performing a merge, but 
in this case the error does not appear.

After some debug, I have noticed that the document id at the end of a 
segment is the same than (or inferior to) the document id of the next 
segment to be merged. However, even by activating Codec.DEBUG=true, I am 
unable to know which are the faulty segments, and the faulty terms 
inside these segments. Could you indicate me a easy way to get this 
information, so I will be able to check these segments and their encoded 
blocks in order to find and understand the problem ?

Thanks in advance,
-- 
Renaud Delbru

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message