Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 35059 invoked from network); 18 Mar 2008 20:54:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Mar 2008 20:54:06 -0000 Received: (qmail 69391 invoked by uid 500); 18 Mar 2008 20:54:00 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 68450 invoked by uid 500); 18 Mar 2008 20:53:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 68438 invoked by uid 99); 18 Mar 2008 20:53:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Mar 2008 13:53:58 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.200.169] (HELO wf-out-1314.google.com) (209.85.200.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Mar 2008 20:53:09 +0000 Received: by wf-out-1314.google.com with SMTP id 28so61077wff.20 for ; Tue, 18 Mar 2008 13:53:28 -0700 (PDT) Received: by 10.142.187.2 with SMTP id k2mr1487941wff.77.1205873607943; Tue, 18 Mar 2008 13:53:27 -0700 (PDT) Received: from ?10.17.4.4? ( [72.93.214.93]) by mx.google.com with ESMTPS id 33sm32240932wra.23.2008.03.18.13.53.27 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 18 Mar 2008 13:53:27 -0700 (PDT) In-Reply-To: <8c4e68610803181345r45b613f3p32594fc93f7df804@mail.gmail.com> References: <8c4e68610803180438x39737565q9f97b4802ed774a5@mail.gmail.com> <8c4e68610803180738o54ad1945x1dc2ba8501788faf@mail.gmail.com> <41FC01A5-C674-4D08-9C98-936550C6ABD2@mikemccandless.com> <8c4e68610803181036g4b206cc1x166a4e8a7240e36b@mail.gmail.com> <8c4e68610803181345r45b613f3p32594fc93f7df804@mail.gmail.com> Mime-Version: 1.0 (Apple Message framework v753) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Cc: java-user@lucene.apache.org Content-Transfer-Encoding: 7bit From: Michael McCandless Subject: Re: CorruptIndexException with some versions of java Date: Tue, 18 Mar 2008 16:53:25 -0400 To: Ian Lea X-Mailer: Apple Mail (2.753) X-Virus-Checked: Checked by ClamAV on apache.org Ian can you attach your version of SegmentMerger.java? Somehow my lines are off from yours. Mike Ian Lea wrote: > Mike > > > Latest patch produces similar exception: > > Exception in thread "Lucene Merge Thread #0" > org.apache.lucene.index.MergePolicy$MergeException: > java.lang.AssertionError: after mergeFields: fdx size mismatch: 65184 > docs vs 521464 length in bytes of _c9.fdx > at > org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException( > ConcurrentMergeScheduler.java:320) > at org.apache.lucene.index.ConcurrentMergeScheduler > $MergeThread.run(ConcurrentMergeScheduler.java:297) > Caused by: java.lang.AssertionError: after mergeFields: fdx size > mismatch: 65184 docs vs 521464 length in bytes of _c9.fdx > at org.apache.lucene.index.SegmentMerger.mergeFields > (SegmentMerger.java:347) > at org.apache.lucene.index.SegmentMerger.merge > (SegmentMerger.java:133) > at org.apache.lucene.index.IndexWriter.mergeMiddle > (IndexWriter.java:3852) > at org.apache.lucene.index.IndexWriter.merge > (IndexWriter.java:3504) > at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge > (ConcurrentMergeScheduler.java:211) > at org.apache.lucene.index.ConcurrentMergeScheduler > $MergeThread.run(ConcurrentMergeScheduler.java:266) > > Latest infostream attached. > > > -- > Ian. > > > On Tue, Mar 18, 2008 at 6:05 PM, Michael McCandless > wrote: >> >> Hi Ian, >> >> Sheesh that's odd. The SegmentMerger produced an .fdx file that is >> one document too short. >> >> Can you run with this patch now, again applied to head of 2.3 >> branch? I just added another assert inside the loop that does the >> field merging. >> >> I will scrutinize this code... >> >> Mike >> >> >> >> >> Ian Lea wrote: >>> Mike >>> >>> >>> Patch applied and test re-run and picked up an assertion error this >>> time: >>> >>> Exception in thread "Lucene Merge Thread #0" >>> org.apache.lucene.index.MergePolicy$MergeException: >>> java.lang.AssertionError: after mergeFields: fdx size mismatch: >>> 72357 >>> docs vs 578848 length in bytes of _3o.fdx >>> at >>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeExceptio >>> n( >>> ConcurrentMergeScheduler.java:320) >>> at org.apache.lucene.index.ConcurrentMergeScheduler >>> $MergeThread.run(ConcurrentMergeScheduler.java:297) >>> Caused by: java.lang.AssertionError: after mergeFields: fdx size >>> mismatch: 72357 docs vs 578848 length in bytes of _3o.fdx >>> at org.apache.lucene.index.SegmentMerger.mergeFields >>> (SegmentMerger.java:342) >>> at org.apache.lucene.index.SegmentMerger.merge >>> (SegmentMerger.java:133) >>> at org.apache.lucene.index.IndexWriter.mergeMiddle >>> (IndexWriter.java:3852) >>> at org.apache.lucene.index.IndexWriter.merge >>> (IndexWriter.java:3504) >>> at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge >>> (ConcurrentMergeScheduler.java:211) >>> at org.apache.lucene.index.ConcurrentMergeScheduler >>> $MergeThread.run(ConcurrentMergeScheduler.java:266) >>> >>> The infostream output is attached. Since this email is to you >>> and the >>> list it should make it to you. >>> >>> >>> >>> Yonik: I haven't been able to make TestStressIndexing2 fail. >>> >>> >>> -- >>> Ian. >>> >>> >>> On Tue, Mar 18, 2008 at 4:19 PM, Michael McCandless >>> wrote: >>>> >>>> Ian, >>>> >>>> Could you apply the attached patch applied to the head of the 2.3 >>>> branch? >>>> >>>> It only adds more asserts, to try to pinpoint where exactly this >>>> corruption starts. >>>> >>>> Then, re-run the test with asserts enabled and infoStream >>>> turned on >>>> and post back. Thanks. >>>> >>>> Mike >>>> >>>> >>>> >>>> >>>> Ian Lea wrote: >>>> >>>>> It's failed on servers running SuSE 10.0 and 8.2 (ancient!) >>>>> >>>>> $ uname -a shows >>>>> Linux phoebe 2.6.13-15-smp #1 SMP Tue Sep 13 14:56:15 UTC 2005 >>>>> x86_64 >>>>> x86_64 x86_64 GNU/Linux >>>>> >>>>> and >>>>> >>>>> Linux phobos 2.4.20-64GB-SMP #1 SMP Mon Mar 17 17:56:03 UTC 2003 >>>>> i686 >>>>> unknown unknown GNU/Linux >>>>> >>>>> The first one has a 2.8Ghz Intel CPU, don't know about the second. >>>>> >>>>> >>>>> I'll try and run the stress test. >>>>> >>>>> >>>>> -- >>>>> Ian. >>>>> >>>>> >>>>> >>>>> On Tue, Mar 18, 2008 at 2:17 PM, Yonik Seeley >>>>> wrote: >>>>>> >>>>>> On Tue, Mar 18, 2008 at 7:38 AM, Ian Lea >>>>>> wrote: >>>>>>> Hi >>>>>>> >>>>>>> >>>>>>> When bulk loading into a new index I'm seeing this exception >>>>>>> >>>>>>> Exception in thread "Thread-1" >>>>>>> org.apache.lucene.index.MergePolicy$MergeException: >>>>>>> org.apache.lucene.index.CorruptIndexException: doc counts >>>>>>> differ >>>>>>> for >>>>>>> segment _4l: fieldsReader shows 67861 but segmentInfo shows >>>>>>> 67862 >>>>>>> at org.apache.lucene.index.ConcurrentMergeScheduler >>>>>>> $MergeThread.run(ConcurrentMergeScheduler.java:271) >>>>>>> Caused by: org.apache.lucene.index.CorruptIndexException: doc >>>>>>> counts >>>>>>> differ for segment _4l: fieldsReader shows 67861 but >>>>>>> segmentInfo >>>>>>> shows >>>>>>> 67862 >>>>>>> at org.apache.lucene.index.SegmentReader.initialize >>>>>>> (SegmentReader.java:313) >>>>>>> at org.apache.lucene.index.SegmentReader.get >>>>>>> (SegmentReader.java:262) >>>>>>> at org.apache.lucene.index.SegmentReader.get >>>>>>> (SegmentReader.java:221) >>>>>>> at org.apache.lucene.index.IndexWriter.mergeMiddle >>>>>>> (IndexWriter.java:3093) >>>>>>> at org.apache.lucene.index.IndexWriter.merge >>>>>>> (IndexWriter.java:2834) >>>>>>> at org.apache.lucene.index.ConcurrentMergeScheduler >>>>>>> $MergeThread.run(ConcurrentMergeScheduler.java:240) >>>>>>> >>>>>>> when use java version 1.6.0_05-b13 or 1.6.0_04-b12 on linux, >>>>>>> with >>>>>>> lucene 2.3.0 or 2.3.1 or lucene-core-2.3-SNAPSHOT from >>>>>>> yesterday. >>>>>>> >>>>>>> With java version 1.6.0_03-b05 things work fine. >>>>>>> >>>>>>> The exception happens a few hundred thousand documents into the >>>>>>> load. >>>>>>> >>>>>>> A different program updating a different index with different >>>>>>> data on >>>>>>> a different server gave a similar error on version 1.6.0_05- >>>>>>> b13 and >>>>>>> lucene 2.3.0. >>>>>>> >>>>>>> Any ideas? Is this maybe a known issue or am I missing >>>>>>> something obvious? >>>>>> >>>>>> My guess is perhaps a thread safety bug, more likely in Lucene >>>>>> indexing code (less likely in the JVM or specific libc). >>>>>> >>>>>> What Linux version are you using? >>>>>> What hardware are you running on (specifically, the CPU)? >>>>>> >>>>>> If possible, it would be great if you could check out Lucene >>>>>> trunk, >>>>>> crank up the iterations by modifying the TestStressIndexing2 and >>>>>> maybe >>>>>> fiddle with some of the other parameters in >>>>>> TestStressIndexing2.testMultiConfig(), and see if you can get >>>>>> it to >>>>>> fail. >>>>>> >>>>>> >>>>>> -Yonik >>>>>> >>>>>> >>>>>> ----------------------------------------------------------------- >>>>>> -- >>>>>> -- >>>>>> >>>>>> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>> For additional commands, e-mail: java-user- >>>>>> help@lucene.apache.org >>>>>> >>>>>> >>>>> >>>>> ------------------------------------------------------------------ >>>>> -- >>>>> - >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>> >>>> >>>> >>>> >>>> >> >> >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org