lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3855) TestStressNRT failures (reproducible)
Date Fri, 09 Mar 2012 20:12:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226393#comment-13226393
] 

Michael McCandless commented on LUCENE-3855:
--------------------------------------------

bq. Mike, would it help if we dumped a linear sequence of each thread's ops on indexwriter/
segmentinfos, whatever else?

Thanks Dawid!

I actually know the root cause here:
{noformat}
Uncaught exception by thread: Thread[Lucene Merge Thread #72,6,main]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.AssertionError
	at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:509)
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:480)
Caused by: java.lang.AssertionError
	at org.apache.lucene.index.IndexWriter.commitMergedDeletes(IndexWriter.java:3028)
	at org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:3137)
	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3718)
	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3257)
	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382)
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451)
{noformat}

After that assert trips all kinds of crazy other exceptions can happen (not-closed files,
not-live SegmentInfo, etc.).

After the merge finishes, which can take a long time, in commitMergedDeletes we revisit each
segment so we can "carry forward" any new deletions recorded against that segment, to the
newly merged segment.  In an active NRT app there can be many deletes to carry forward...

That tripped assert was to verify the ReadersAndLiveDocs (RLD) was still present in IW's ReaderPool;
it's supposed to remain present throughout merging because we had incRef'd the SegmentReader
we opened for merging.

But, it can in fact be dropped (the bug here) by another thread opening a reader and applying
deletes and decRef'ing the reader all after the merge thread 1) acquired the RLD but 2) before
it opened the mergeReader from it.

I (accidentally!!) caused this with LUCENE-3631, where we moved writeable deletes from SegmentReader
into IndexWriter.  I suspect we need to add a separate refCount to RLD to fix this... I'm
working on that.
                
> TestStressNRT failures (reproducible)
> -------------------------------------
>
>                 Key: LUCENE-3855
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3855
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: hoss-r1298470-fixed-seed__TEST-org.apache.lucene.index.TestStressNRT.xml,
output1.log, output2.log, output3.log, output4.log
>
>
> Build server logs. Reproduces on at least two machines.
> {noformat}
>     [junit] ------------- Standard Error -----------------
>     [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressNRT -Dtestmethod=test
-Dtests.seed=69468941c1bbf693:19e66d58475da929:69e9d2f81769b6d0 -Dargs="-Dfile.encoding=UTF-8"
>     [junit] NOTE: test params are: codec=Lucene3x, sim=RandomSimilarityProvider(queryNorm=true,coord=false):
{}, locale=ro, timezone=Etc/GMT+1
>     [junit] NOTE: all tests run in this JVM:
>     [junit] [TestStressNRT]
>     [junit] NOTE: Linux 3.0.0-16-generic amd64/Sun Microsystems Inc. 1.6.0_27 (64-bit)/cpus=2,threads=1,free=74960064,total=135987200
>     [junit] ------------- ---------------- ---------------
>     [junit] Testcase: test(org.apache.lucene.index.TestStressNRT):	Caused an ERROR
>     [junit] MockDirectoryWrapper: cannot close: there are still open files: {_ng.cfs=8}
>     [junit] java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are
still open files: {_ng.cfs=8}
>     [junit] 	at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:555)
>     [junit] 	at org.apache.lucene.index.TestStressNRT.test(TestStressNRT.java:385)
>     [junit] 	at org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:743)
>     [junit] 	at org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:639)
>     [junit] 	at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22)
>     [junit] 	at org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:538)
>     [junit] 	at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:600)
>     [junit] 	at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:164)
>     [junit] 	at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
>     [junit] 	at org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:21)
>     [junit] 	at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22)
>     [junit] Caused by: java.lang.RuntimeException: unclosed IndexInput: _ng.cfs
>     [junit] 	at org.apache.lucene.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:479)
>     [junit] 	at org.apache.lucene.store.MockDirectoryWrapper$1.openSlice(MockDirectoryWrapper.java:777)
>     [junit] 	at org.apache.lucene.store.CompoundFileDirectory.openInput(CompoundFileDirectory.java:221)
>     [junit] 	at org.apache.lucene.codecs.lucene3x.TermInfosReader.<init>(TermInfosReader.java:112)
>     [junit] 	at org.apache.lucene.codecs.lucene3x.Lucene3xFields.<init>(Lucene3xFields.java:84)
>     [junit] 	at org.apache.lucene.codecs.lucene3x.PreFlexRWPostingsFormat$1.<init>(PreFlexRWPostingsFormat.java:51)
>     [junit] 	at org.apache.lucene.codecs.lucene3x.PreFlexRWPostingsFormat.fieldsProducer(PreFlexRWPostingsFormat.java:51)
>     [junit] 	at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:108)
>     [junit] 	at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:51)
>     [junit] 	at org.apache.lucene.index.IndexWriter$ReadersAndLiveDocs.getMergeReader(IndexWriter.java:521)
>     [junit] 	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3587)
>     [junit] 	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3257)
>     [junit] 	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382)
>     [junit] 	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451)
>     [junit] 
>     [junit] 
>     [junit] Test org.apache.lucene.index.TestStressNRT FAILED
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message