cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Low (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8336) Quarantine nodes after receiving the gossip shutdown message
Date Sat, 28 Feb 2015 02:18:04 GMT


Richard Low commented on CASSANDRA-8336:

Here it is:

ERROR [main] 2015-02-27 18:11:57,584 (line 513) Exception encountered
during startup
java.lang.RuntimeException: Unable to gossip with any seeds
        at org.apache.cassandra.gms.Gossiper.doShadowRound(
        at org.apache.cassandra.service.StorageService.checkForEndpointCollision(
        at org.apache.cassandra.service.StorageService.prepareToJoin(
        at org.apache.cassandra.service.StorageService.initServer(
        at org.apache.cassandra.service.StorageService.initServer(
        at org.apache.cassandra.service.CassandraDaemon.setup(
        at org.apache.cassandra.service.CassandraDaemon.activate(
        at org.apache.cassandra.service.CassandraDaemon.main(
 INFO [StorageServiceShutdownHook] 2015-02-27 18:11:57,605 (line 1370) Announcing
ERROR [StorageServiceShutdownHook] 2015-02-27 18:11:57,607 (line 199)
Exception in thread Thread[StorageServiceShutdownHook,5,main]
        at org.apache.cassandra.gms.Gossiper.addLocalApplicationState(
        at org.apache.cassandra.gms.Gossiper.stop(
        at org.apache.cassandra.service.StorageService$1.runMayThrow(

> Quarantine nodes after receiving the gossip shutdown message
> ------------------------------------------------------------
>                 Key: CASSANDRA-8336
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 2.0.13
>         Attachments: 8336-v2.txt, 8336-v3.txt, 8336.txt
> In CASSANDRA-3936 we added a gossip shutdown announcement.  The problem here is that
this isn't sufficient; you can still get TOEs and have to wait on the FD to figure things
out.  This happens due to gossip propagation time and variance; if node X shuts down and sends
the message to Y, but Z has a greater gossip version than Y for X and has not yet received
the message, it can initiate gossip with Y and thus mark X alive again.  I propose quarantining
to solve this, however I feel it should be a -D parameter you have to specify, so as not to
destroy current dev and test practices, since this will mean a node that shuts down will not
be able to restart until the quarantine expires.

This message was sent by Atlassian JIRA

View raw message