lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-4161) deadlock if commit+newSearcher occurs during core close, can happen as a result of snappuller (occured in TestReplicationHandler)
Date Mon, 10 Dec 2012 20:01:21 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528225#comment-13528225
] 

Mark Miller commented on SOLR-4161:
-----------------------------------

I think I just fixed this - see commits from a short while ago and my reply to the fail of
replication handler test on the dev list.

More details to follow.
                
> deadlock if commit+newSearcher occurs during core close, can happen as a result of snappuller
(occured in TestReplicationHandler)
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4161
>                 URL: https://issues.apache.org/jira/browse/SOLR-4161
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>         Attachments: dump2.txt, dump3.txt, dump4.txt, dump5.txt
>
>
> There appears to be a lock related bug in the DefaultSolrCoreState/DirectUpdateHandler2
interactions. It appears that if CoreContainer is shutting down the core at the same time
that some other thread attempts to do a commit which triggers a newSearcher, then DefaultSolrCoreState.closeIndexWriter
and DefaultSolrCoreState.getIndexWriter get into deadlock.
> This has been observed in TestReplicationHandler, but doesn't appear to be related to
any bugs in thta testcase, so it seems like it could easily affect real life users as well.
> Summary of the deadlock stacks, see attachments for full details...
> {noformat}
> Found one Java-level deadlock:
> =============================
> "snapPuller-422-thread-1":
>   waiting to lock monitor 0x00007f5a2011a9e0 (object 0x00000000f5f485a0, a org.apache.solr.update.DefaultSolrCoreState),
>   which is held by "TEST-TestReplicationHandler.test-seed#[1B46F52130C14E03]"
> "TEST-TestReplicationHandler.test-seed#[1B46F52130C14E03]":
>   waiting for ownable synchronizer 0x00000000f60fe5c8, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
>   which is held by "snapPuller-422-thread-1"
> Java stack information for the threads listed above:
> ===================================================
> "snapPuller-422-thread-1":
> 	at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:77)
> 	- waiting to lock <0x00000000f5f485a0> (a org.apache.solr.update.DefaultSolrCoreState)
> 	at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1358)
> 	at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:561)
> 	- locked <0x00000000f5f485d0> (a java.lang.Object)
> 	at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:655)
> 	at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:454)
> 	at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:273)
> 	at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:222)
> ...
> "TEST-TestReplicationHandler.test-seed#[1B46F52130C14E03]":
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000000f60fe5c8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:871)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1201)
> 	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> 	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> 	at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:668)
> 	at org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64)
> 	- locked <0x00000000f5f485a0> (a org.apache.solr.update.DefaultSolrCoreState)
> 	at org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:259)
> 	- locked <0x00000000f5f485a0> (a org.apache.solr.update.DefaultSolrCoreState)
> 	at org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore.java:879)
> 	- locked <0x00000000f5f485a0> (a org.apache.solr.update.DefaultSolrCoreState)
> 	at org.apache.solr.core.SolrCore.close(SolrCore.java:971)
> 	at org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:723)
> {noformat}
> Original Report...
> {quote}
> while testing out another patch i noticed "stalled" heartbeat messages getting logged
by TestReplicationHandler.test and started taking some stack traces to see if it was in the
code i was working on.
> it's not, so i suspect it's unrelated to the changes i'm looking at, but i did notice
that there was a full on deadlock reported, so i wanted to make sure it got tracked.
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message