cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8336) Quarantine nodes after receiving the gossip shutdown message
Date Fri, 12 Dec 2014 22:12:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244880#comment-14244880
] 

Brandon Williams commented on CASSANDRA-8336:
---------------------------------------------

Perhaps one thing we could do is put the node into hibernation before the shutdown message.
 This way, it will never get marked alive regardless of the heartbeat, even if it propagates
later.  We might want a new dead state for that though, since I don't want to overload the
hibernation state with too many functions since that will complicate known what state a node
is really in.

> Quarantine nodes after receiving the gossip shutdown message
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-8336
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8336
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 2.0.12
>
>
> In CASSANDRA-3936 we added a gossip shutdown announcement.  The problem here is that
this isn't sufficient; you can still get TOEs and have to wait on the FD to figure things
out.  This happens due to gossip propagation time and variance; if node X shuts down and sends
the message to Y, but Z has a greater gossip version than Y for X and has not yet received
the message, it can initiate gossip with Y and thus mark X alive again.  I propose quarantining
to solve this, however I feel it should be a -D parameter you have to specify, so as not to
destroy current dev and test practices, since this will mean a node that shuts down will not
be able to restart until the quarantine expires.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message