cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-2603) node stuck in 'Down' in nodetool ring, until disablegossip/enablegossip flapped it back into submission
Date Wed, 04 May 2011 10:03:03 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028695#comment-13028695
] 

Peter Schuller commented on CASSANDRA-2603:
-------------------------------------------

I should note that I do not know whether this only affects the nodetool ring output or whether
it affects RPC traffic. If only nodetool ring, then it's a minor bug. If it affects routing
of messages, it's IMO major. I have not checked the code for whether a discrepancy between
ring output and routing is plausible.

> node stuck in 'Down' in nodetool ring, until disablegossip/enablegossip flapped it back
into submission
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2603
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2603
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Peter Schuller
>
> Cluster with 0.7.4 and 9 machines. I was doing rolling restarts so nodes were expected
to have flappted up/down a bit.
> After cleanup, I noticed that one of the nodes 'nodetool ring' claimed that another node
was Down. I'll call the node that considered the *other* one to be down "UpNode" and the node
that was considered *down* "DownNode".
> DownNode was the next successor on the ring relative to UpNode. Only UpNode thought it
was down; all others members of the clusters agreed it was up. This stayed the case for almost
24 hours.
> In system.log on UpNode, it is clearly visible that DownNode flapped to state UP recently
with no notification of flapping to state DOWN afterwards. Yet 'nodetool ring' reported Down.
> Today, I did disablegossip+wait-for-a-bit+enablegossip on DownNode. This caused 'nodetool
ring' on UpNode to again reflect reality that DownNode is in fact up.
> I do not have a reproducable test case but wanted to file it since I don't remember seeing,
and didn't easily find, a JIRA bug indicating a bug with this effect has recently been fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message