Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 21446 invoked from network); 25 Mar 2010 16:56:21 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 25 Mar 2010 16:56:21 -0000 Received: (qmail 80916 invoked by uid 500); 25 Mar 2010 16:56:19 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 80752 invoked by uid 500); 25 Mar 2010 16:56:19 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 80744 invoked by uid 99); 25 Mar 2010 16:56:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Mar 2010 16:56:19 +0000 X-ASF-Spam-Status: No, hits=-1.3 required=10.0 tests=AWL,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [140.203.201.100] (HELO mx1.nuigalway.ie) (140.203.201.100) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Mar 2010 16:56:11 +0000 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAH4yq0sKhJ0L/2dsb2JhbACbI3O/T4R9BA X-IronPort-AV: E=Sophos;i="4.51,308,1267401600"; d="scan'208";a="136864016" Received: from unknown (HELO EVS1.ac.nuigalway.ie) ([10.132.157.11]) by mx1.nuigalway.ie with ESMTP; 25 Mar 2010 16:55:19 +0000 Received: from EVS1.ac.nuigalway.ie ([10.132.157.14]) by EVS1.ac.nuigalway.ie with Microsoft SMTPSVC(6.0.3790.3959); Thu, 25 Mar 2010 16:55:18 +0000 Received: from [10.2.18.102] ([140.203.154.11]) by EVS1.ac.nuigalway.ie over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959); Thu, 25 Mar 2010 16:55:18 +0000 Message-ID: <4BAB956B.3050009@deri.org> Date: Thu, 25 Mar 2010 16:55:07 +0000 From: Renaud Delbru User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9pre) Gecko/20100217 Lightning/1.0pre Shredder/3.0.3pre MIME-Version: 1.0 To: java-user Subject: Flex API - Debugging Segment Merge Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 25 Mar 2010 16:55:18.0321 (UTC) FILETIME=[F2DD5A10:01CACC3B] Hi, I am currently benchmarking various compression algorithms using the Sep Codec, but I got index corruption exception during the merge process, and I would need your help to debug it. I have reimplemented various algorithms like FOR, Simple9, VInt, PFor for the Sep IntBlock Codec. I am benchmarking them now on the wikipedia dataset. For some algorithms, FOR, Simple9, etc., I don't encounter problems. But using the PFor algorithms, I get a CorruptedIndex exception during the merge process (in SepPostingsWriterImpl#startDoc), because document are out of order: Exception in thread "Lucene Merge Thread #0" org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: docs out of order (153 <= 153 ) at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:471) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:435) Caused by: org.apache.lucene.index.CorruptIndexException: docs out of order (153 <= 153 ) at org.apache.lucene.index.codecs.sep.SepPostingsWriterImpl.startDoc(SepPostingsWriterImpl.java:177) However, this is happening only when I tried to index the wikipedia dataset using the PFor algorithm. I have tried to recreate the error using a unit test, creating random document, and performing a merge, but in this case the error does not appear. After some debug, I have noticed that the document id at the end of a segment is the same than (or inferior to) the document id of the next segment to be merged. However, even by activating Codec.DEBUG=true, I am unable to know which are the faulty segments, and the faulty terms inside these segments. Could you indicate me a easy way to get this information, so I will be able to check these segments and their encoded blocks in order to find and understand the problem ? Thanks in advance, -- Renaud Delbru --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org