lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-4158) Windows Tests (4.x) hanging for 5 hrs in stall control again
Date Mon, 25 Jun 2012 08:14:43 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-4158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon Willnauer updated LUCENE-4158:
------------------------------------

    Attachment: LUCENE-4158.patch

this must be 'lucky monday'. I mean I really enjoy starting the week with a sneaky concurrency
problem but it's even better if you can reproduce it! Well, it came even better... I reproduced
the hang with IW output enabled and a debug pointer in place! How lucky is that! Good news,
I know what happens and we can easily fix it. 

Here we go.... some thread triggers a full flush which means we take all DWPT out of the loop
and flush them to do a reopen. Yet, at the same time we need to prevent later documents from
sneaking into the flush. That is done by preventing any DWPT from flushing which was not part
of the full flush until that is done! Yet if we pile up lots of docs (or have a smallish ram
buffer / low max buffered docs) this can happen easily. Now if we reach the threashold for
stalling while the full flush is flushing its last segment we check if we can help flushing
with indexing threads. That is not possible since all DWPT that don't belong to the full flush
are blocked. So we basically can not help and are over the stall limit, so we put ourself
to sleep. Now the full flush is done and it gives us the wakeup call but nobody else will
ever flush one of the blocked DWPT since all threads are waiting. 

basically a chicken/egg problem.. this patch simply removes the loop in DWStallControl and
lets the higher level logic restall after we retried to flush pending DWPT instances. ie healing
ourself.
                
> Windows Tests (4.x) hanging for 5 hrs in stall control again
> ------------------------------------------------------------
>
>                 Key: LUCENE-4158
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4158
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0, 5.0
>            Reporter: Uwe Schindler
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0, 5.0
>
>         Attachments: LUCENE-4158.patch, LUCENE-4158.patch
>
>
> Here the stack dump, from build (extracted with jstack on windows): http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows-Java6-64/118/
> JVM is 1.6.0_32, 64bit, server, 2 CPUs, Windows 7 64
> {noformat}
> 2012-06-19 17:21:42
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.7-b02 mixed mode):
> "Thread 0" prio=6 tid=0x00000000070dc000 nid=0xcec waiting on condition [0x000000000e18f000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000000f684e8d8> (a org.apache.lucene.index.DocumentsWriterStallControl$Sync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 	at org.apache.lucene.index.DocumentsWriterStallControl.waitIfStalled(DocumentsWriterStallControl.java:120)
> 	at org.apache.lucene.index.DocumentsWriterFlushControl.waitIfStalled(DocumentsWriterFlushControl.java:618)
> 	at org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:301)
> 	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:361)
> 	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1327)
> 	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1078)
> 	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1059)
> 	at org.apache.lucene.index.TestNRTReaderWithThreads$RunThread.run(TestNRTReaderWithThreads.java:94)
>    Locked ownable synchronizers:
> 	- None
> "TEST-TestScope-org.apache.lucene.index.TestNRTReaderWithThreads.testIndexing-seed#[641A6B3E2297F46E]"
prio=6 tid=0x00000000070d9800 nid=0x448 in Object.wait() [0x00000000058de000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000000f685b540> (a org.apache.lucene.index.TestNRTReaderWithThreads$RunThread)
> 	at java.lang.Thread.join(Thread.java:1186)
> 	- locked <0x00000000f685b540> (a org.apache.lucene.index.TestNRTReaderWithThreads$RunThread)
> 	at java.lang.Thread.join(Thread.java:1239)
> 	at org.apache.lucene.index.TestNRTReaderWithThreads.testIndexing(TestNRTReaderWithThreads.java:61)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
> 	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> 	at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
> 	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> 	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> 	at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
> 	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> 	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
> 	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> 	at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
> 	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> 	at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
> 	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> 	at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
> 	at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
> 	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
> 	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> 	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
>    Locked ownable synchronizers:
> 	- None
> "Low Memory Detector" daemon prio=6 tid=0x000000000492f000 nid=0xfe4 runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
> 	- None
> "C2 CompilerThread1" daemon prio=10 tid=0x000000000492c000 nid=0x6b0 waiting on condition
[0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
> 	- None
> "C2 CompilerThread0" daemon prio=10 tid=0x00000000002ca800 nid=0xf1c waiting on condition
[0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
> 	- None
> "Attach Listener" daemon prio=10 tid=0x00000000002c9000 nid=0xcc8 waiting on condition
[0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
> 	- None
> "Signal Dispatcher" daemon prio=10 tid=0x0000000004920800 nid=0x91c runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
> 	- None
> "Finalizer" daemon prio=8 tid=0x00000000002b5000 nid=0xf4c in Object.wait() [0x00000000048df000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000000e01873c8> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
> 	- locked <0x00000000e01873c8> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
> 	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>    Locked ownable synchronizers:
> 	- None
> "Reference Handler" daemon prio=10 tid=0x00000000002ac000 nid=0xe10 in Object.wait()
[0x00000000047df000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000000e0156290> (a java.lang.ref.Reference$Lock)
> 	at java.lang.Object.wait(Object.java:485)
> 	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
> 	- locked <0x00000000e0156290> (a java.lang.ref.Reference$Lock)
>    Locked ownable synchronizers:
> 	- None
> "main" prio=6 tid=0x00000000003ae000 nid=0x558 in Object.wait() [0x0000000000e0f000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000000f67e96b8> (a com.carrotsearch.randomizedtesting.RandomizedRunner$2)
> 	at java.lang.Thread.join(Thread.java:1186)
> 	- locked <0x00000000f67e96b8> (a com.carrotsearch.randomizedtesting.RandomizedRunner$2)
> 	at java.lang.Thread.join(Thread.java:1239)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:561)
> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.run(RandomizedRunner.java:521)
> 	at com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.execute(SlaveMain.java:145)
> 	at com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.main(SlaveMain.java:238)
> 	at com.carrotsearch.ant.tasks.junit4.slave.SlaveMainSafe.main(SlaveMainSafe.java:12)
>    Locked ownable synchronizers:
> 	- None
> "VM Thread" prio=10 tid=0x00000000002a3800 nid=0x614 runnable 
> "GC task thread#0 (ParallelGC)" prio=6 tid=0x0000000000203000 nid=0x7dc runnable 
> "GC task thread#1 (ParallelGC)" prio=6 tid=0x0000000000205000 nid=0xc20 runnable 
> "VM Periodic Task Thread" prio=10 tid=0x0000000004938000 nid=0x720 waiting on condition

> JNI global references: 1586
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message