Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 85169 invoked from network); 29 Mar 2011 15:30:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Mar 2011 15:30:35 -0000 Received: (qmail 80918 invoked by uid 500); 29 Mar 2011 15:30:34 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 80862 invoked by uid 500); 29 Mar 2011 15:30:34 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 80855 invoked by uid 99); 29 Mar 2011 15:30:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2011 15:30:34 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2011 15:30:29 +0000 Received: by wyb40 with SMTP id 40so357120wyb.35 for ; Tue, 29 Mar 2011 08:30:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.242.134 with SMTP id i6mr4403342wer.81.1301412600413; Tue, 29 Mar 2011 08:30:00 -0700 (PDT) Received: by 10.227.156.143 with HTTP; Tue, 29 Mar 2011 08:30:00 -0700 (PDT) In-Reply-To: References: Date: Tue, 29 Mar 2011 11:30:00 -0400 Message-ID: Subject: Re: Test failure on 3x From: Michael McCandless To: dev@lucene.apache.org Cc: Shai Erera Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable OK the failure is because of the interaction of CMS and LogMP. I think we should just cut this test back to SMS, since it's specifically testing the invariants of LogMP in a "controlled" setting. I'll commit that... The problem happens because a BG merge finishes before (in time) the merges before it (in space), and the LogMP is called again, and it sees the index in this state, and then picks an "invalid" (according to its strict invariants) merge. I think this is harmless in practice. Mike http://blog.mikemccandless.com On Mon, Mar 28, 2011 at 12:02 PM, Michael McCandless wrote: > Shai, > > This has also been failing on trunk, but I can't repro it, so it's > awesome you can! > > Can you turn on IW's infoStream and get the failure to happen and post > the results? > > I agree this is likely a CMS thing... > > Mike > > http://blog.mikemccandless.com > > On Mon, Mar 28, 2011 at 8:15 AM, Shai Erera wrote: >> Turns out this is not easily reproducible. While I am able to reproduce = it >> *many times* on my machine, it's not *always*. I did notice a strange >> behavior -- when the test fails, the stack trace points to what seems to= be >> an incorrect line. I don't know why, even though I 'ant clean' and 'svn = up' >> etc. >> >> Anyway, I ported some debug messages from trunk to 3x, and with that, I = see >> that the failure is on the line "assertTrue(docCount > lowerBound)" in >> checkInvariants. Below is the stacktrace (note that it points at an >> incorrect line #222 which is not the assert that fails): >> >> java.lang.AssertionError: docCount=3D10 lowerBound=3D100 i=3D7 segmentCo= unt=3D11 >> index=3D_1e(3.2):Cv1081 _1z(3.2):Cv1064 _2k(3.2):Cv1425 _35(3.2):Cv1510 >> _3q(3.2):cv190 _64(3.2):cv550 _5f(3.2):cv10 _5g(3.2):cv10 _61(3.2):cv190 >> _62(3.2):cv10 _63(3.2):cv10 >> =A0=A0=A0 at org.junit.Assert.fail(Assert.java:91) >> =A0=A0=A0 at org.junit.Assert.assertTrue(Assert.java:43) >> =A0=A0=A0 at >> org.apache.lucene.index.TestIndexWriterMergePolicy.checkInvariants(TestI= ndexWriterMergePolicy.java:222) >> =A0=A0=A0 at >> org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsCh= ange(TestIndexWriterMergePolicy.java:168) >> >> I believe that the fact it's not always reproducible is related to CMS. = I >> ran the test with -Dtests.iter=3D10 and it failed a couple of times with= CMS, >> but if I switch to SMS and run with -Dtests.iter=3D100 it succeeds. >> >> The test includes this code: >> >> {code} >> =A0=A0=A0 writer.commit(); >> =A0=A0=A0 writer.waitForMerges(); >> =A0=A0=A0 writer.commit(); >> {code} >> >> It's clear that the test wants to want for CMS to finish all its merges. >> However, according to waitForMerges' javadocs: "It is guaranteed that an= y >> merges started *prior* to calling this method will have completed once t= his >> method completes." >> >> What if a merge starts after waitForMerges is called (due to cascaded >> merges) -- will waitForMerges still wait for it? To check that I tried t= he >> following: >> >> 1) I added another call to waitForMerges following the second commit(), = but >> that didn't help (out of 100 runs, I had 3 failures). >> 2) I add this call instead: ((ConcurrentMergeScheduler) >> writer.getConfig().getMergeScheduler()).sync(); but it didn't help eithe= r (9 >> out of 100 failures) >> 3) Tried to close the writer and open it, still failures. >> >> So I'm not sure that the problem is related to waitForMerges, however it >> clearly has something to do with CMS, because all iterations pass with S= MS. >> >> Shai >> >> On Mon, Mar 28, 2011 at 7:00 AM, Shai Erera wrote: >>> >>> I ran tests on 3x today and hit this: >>> >>> =A0=A0=A0 [junit] Testsuite: org.apache.lucene.index.TestIndexWriterMer= gePolicy >>> =A0=A0=A0 [junit] Testcase: >>> testMaxBufferedDocsChange(org.apache.lucene.index.TestIndexWriterMergeP= olicy): >>> FAILED >>> =A0=A0=A0 [junit] >>> =A0=A0=A0 [junit] junit.framework.AssertionFailedError: >>> =A0=A0=A0 [junit]=A0=A0=A0=A0 at >>> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc= eneTestCase.java:1076) >>> =A0=A0=A0 [junit]=A0=A0=A0=A0 at >>> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc= eneTestCase.java:1008) >>> =A0=A0=A0 [junit]=A0=A0=A0=A0 at >>> org.apache.lucene.index.TestIndexWriterMergePolicy.checkInvariants(Test= IndexWriterMergePolicy.java:221) >>> =A0=A0=A0 [junit]=A0=A0=A0=A0 at >>> org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsC= hange(TestIndexWriterMergePolicy.java:168) >>> =A0=A0=A0 [junit] >>> =A0=A0=A0 [junit] >>> =A0=A0=A0 [junit] Tests run: 6, Failures: 1, Errors: 0, Time elapsed: 2= .777 sec >>> =A0=A0=A0 [junit] >>> =A0=A0=A0 [junit] ------------- Standard Error ----------------- >>> =A0=A0=A0 [junit] NOTE: reproduce with: ant test >>> -Dtestcase=3DTestIndexWriterMergePolicy -Dtestmethod=3DtestMaxBufferedD= ocsChange >>> -Dtests.seed=3D-4805406970232115944:2175520643927646734 >>> =A0=A0=A0 [junit] NOTE: test params are: locale=3Del_CY_PREEURO, >>> timezone=3DAmerica/Dominica >>> =A0=A0=A0 [junit] NOTE: all tests run in this JVM: >>> =A0=A0=A0 [junit] [TestCharArrayMap, TestLengthFilter, TestTeeSinkToken= Filter, >>> TestBinaryDocument, TestByteSlices, TestDocumentWriter, >>> TestIndexReaderClone, TestIndexWriterMergePolicy] >>> =A0=A0=A0 [junit] NOTE: Windows 7 6.1 build 7600 amd64/IBM Corporation = 1.6.0 >>> (64-bit)/cpus=3D2,threads=3D3,free=3D4734480,total=3D13416448 >>> >>> I was able to reproduce it with the 'reproduce' line only on 3x. Also >>> reproducible if it's the only test that's running. I can look at it in = 3-4 >>> hours from now, so am posting this to the list in case someone gets to = it >>> before me. >>> >>> Shai >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org