cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3832) gossip stage backed up due to migration manager future de-ref
Date Sun, 05 Feb 2012 23:01:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200947#comment-13200947
] 

Peter Schuller commented on CASSANDRA-3832:
-------------------------------------------

Meanwhile, MigrationStage is stuck like this:

{code}
"MigrationStage:1" daemon prio=10 tid=0x00007fb5b450e800 nid=0x3395 waiting on condition [0x0000000043479000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000005032ed688> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2116)
	at org.apache.cassandra.net.AsyncResult.get(AsyncResult.java:61)
	at org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:119)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
{code}

The GossipStage submits the job on the migration state on the local node and waits for the
result. The migration stage in turn sends a message and waits for the response synchronously.

The migration request runs on the migration stage on the remote node, which is presumably
stuck with it's own task on the migration stage.

In effect, we are causing a distributed deadlock (or almost deadlock, I'm not sure - I suppose
we might get unstuck eventually since things do time out after rpc timeout).

                
> gossip stage backed up due to migration manager future de-ref 
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-3832
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3832
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>
> This is just bootstrapping a ~ 180 trunk cluster. After a while, a
> node I was on was stuck with thinking all nodes are down, because
> gossip stage was backed up, because it was spending a long time
> (multiple seconds or more, I suppose RPC timeout maybe) doing the
> following. Cluster-wide restart -> back to normal. I have not
> investigated further.
> {code}
> "GossipStage:1" daemon prio=10 tid=0x00007f9d5847a800 nid=0xa6fc waiting on condition
[0x000000004345f000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000005029ad1c0> (a java.util.concurrent.FutureTask$Sync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> 	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> 	at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:364)
> 	at org.apache.cassandra.service.MigrationManager.rectifySchema(MigrationManager.java:132)
> 	at org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:75)
> 	at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:802)
> 	at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:918)
> 	at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message