lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Test failure on 3x
Date Tue, 29 Mar 2011 15:30:00 GMT
OK the failure is because of the interaction of CMS and LogMP.

I think we should just cut this test back to SMS, since it's
specifically testing the invariants of LogMP in a "controlled"
setting.  I'll commit that...

The problem happens because a BG merge finishes before (in time) the
merges before it (in space), and the LogMP is called again, and it
sees the index in this state, and then picks an "invalid" (according
to its strict invariants) merge.

I think this is harmless in practice.

Mike

http://blog.mikemccandless.com

On Mon, Mar 28, 2011 at 12:02 PM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> Shai,
>
> This has also been failing on trunk, but I can't repro it, so it's
> awesome you can!
>
> Can you turn on IW's infoStream and get the failure to happen and post
> the results?
>
> I agree this is likely a CMS thing...
>
> Mike
>
> http://blog.mikemccandless.com
>
> On Mon, Mar 28, 2011 at 8:15 AM, Shai Erera <serera@gmail.com> wrote:
>> Turns out this is not easily reproducible. While I am able to reproduce it
>> *many times* on my machine, it's not *always*. I did notice a strange
>> behavior -- when the test fails, the stack trace points to what seems to be
>> an incorrect line. I don't know why, even though I 'ant clean' and 'svn up'
>> etc.
>>
>> Anyway, I ported some debug messages from trunk to 3x, and with that, I see
>> that the failure is on the line "assertTrue(docCount > lowerBound)" in
>> checkInvariants. Below is the stacktrace (note that it points at an
>> incorrect line #222 which is not the assert that fails):
>>
>> java.lang.AssertionError: docCount=10 lowerBound=100 i=7 segmentCount=11
>> index=_1e(3.2):Cv1081 _1z(3.2):Cv1064 _2k(3.2):Cv1425 _35(3.2):Cv1510
>> _3q(3.2):cv190 _64(3.2):cv550 _5f(3.2):cv10 _5g(3.2):cv10 _61(3.2):cv190
>> _62(3.2):cv10 _63(3.2):cv10
>>     at org.junit.Assert.fail(Assert.java:91)
>>     at org.junit.Assert.assertTrue(Assert.java:43)
>>     at
>> org.apache.lucene.index.TestIndexWriterMergePolicy.checkInvariants(TestIndexWriterMergePolicy.java:222)
>>     at
>> org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsChange(TestIndexWriterMergePolicy.java:168)
>>
>> I believe that the fact it's not always reproducible is related to CMS. I
>> ran the test with -Dtests.iter=10 and it failed a couple of times with CMS,
>> but if I switch to SMS and run with -Dtests.iter=100 it succeeds.
>>
>> The test includes this code:
>>
>> {code}
>>     writer.commit();
>>     writer.waitForMerges();
>>     writer.commit();
>> {code}
>>
>> It's clear that the test wants to want for CMS to finish all its merges.
>> However, according to waitForMerges' javadocs: "It is guaranteed that any
>> merges started *prior* to calling this method will have completed once this
>> method completes."
>>
>> What if a merge starts after waitForMerges is called (due to cascaded
>> merges) -- will waitForMerges still wait for it? To check that I tried the
>> following:
>>
>> 1) I added another call to waitForMerges following the second commit(), but
>> that didn't help (out of 100 runs, I had 3 failures).
>> 2) I add this call instead: ((ConcurrentMergeScheduler)
>> writer.getConfig().getMergeScheduler()).sync(); but it didn't help either (9
>> out of 100 failures)
>> 3) Tried to close the writer and open it, still failures.
>>
>> So I'm not sure that the problem is related to waitForMerges, however it
>> clearly has something to do with CMS, because all iterations pass with SMS.
>>
>> Shai
>>
>> On Mon, Mar 28, 2011 at 7:00 AM, Shai Erera <serera@gmail.com> wrote:
>>>
>>> I ran tests on 3x today and hit this:
>>>
>>>     [junit] Testsuite: org.apache.lucene.index.TestIndexWriterMergePolicy
>>>     [junit] Testcase:
>>> testMaxBufferedDocsChange(org.apache.lucene.index.TestIndexWriterMergePolicy):
>>> FAILED
>>>     [junit]
>>>     [junit] junit.framework.AssertionFailedError:
>>>     [junit]     at
>>> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1076)
>>>     [junit]     at
>>> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1008)
>>>     [junit]     at
>>> org.apache.lucene.index.TestIndexWriterMergePolicy.checkInvariants(TestIndexWriterMergePolicy.java:221)
>>>     [junit]     at
>>> org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsChange(TestIndexWriterMergePolicy.java:168)
>>>     [junit]
>>>     [junit]
>>>     [junit] Tests run: 6, Failures: 1, Errors: 0, Time elapsed: 2.777 sec
>>>     [junit]
>>>     [junit] ------------- Standard Error -----------------
>>>     [junit] NOTE: reproduce with: ant test
>>> -Dtestcase=TestIndexWriterMergePolicy -Dtestmethod=testMaxBufferedDocsChange
>>> -Dtests.seed=-4805406970232115944:2175520643927646734
>>>     [junit] NOTE: test params are: locale=el_CY_PREEURO,
>>> timezone=America/Dominica
>>>     [junit] NOTE: all tests run in this JVM:
>>>     [junit] [TestCharArrayMap, TestLengthFilter, TestTeeSinkTokenFilter,
>>> TestBinaryDocument, TestByteSlices, TestDocumentWriter,
>>> TestIndexReaderClone, TestIndexWriterMergePolicy]
>>>     [junit] NOTE: Windows 7 6.1 build 7600 amd64/IBM Corporation 1.6.0
>>> (64-bit)/cpus=2,threads=3,free=4734480,total=13416448
>>>
>>> I was able to reproduce it with the 'reproduce' line only on 3x. Also
>>> reproducible if it's the only test that's running. I can look at it in 3-4
>>> hours from now, so am posting this to the list in case someone gets to it
>>> before me.
>>>
>>> Shai
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message