cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleg Anastasyev (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8138) replace_address cannot find node to be replaced node after seed node restart
Date Tue, 21 Oct 2014 15:21:34 GMT


Oleg Anastasyev commented on CASSANDRA-8138:

This is because info about tokens, host id and DC:RACK of the dead node from system tables
are loaded only into TokenMetadata on startup, but not to gossip's state. Loading code only
calls Gossip.addSavedEndpoint(InetAddr) , which only adds an inet address of the dead node
with generation 0.
If dead node did not participated in gossip since restart, there are no TOKENS, HOST_ID, etc
app states for it in EndpointState. 
But replace_node, uses gossip shadow round to detect neccessary information about dead node,
so it can replace it. And all it can get from gossip - is just its inet address. And actually
there is a bug in Gossip.examineGossiper, which prevents this info to be sent to a replacing
node as well, so in fact replacing node gets no information on this dead node at all, like
it never existed before. 

I believe the same would apply to a bootrsrapping node, if there was full cluster restart
after some node gone dead and a new node is being added to a cluster. And it would lead to
wrong token metadata at freshly bootsrapped node (did not tested this case, through).

> replace_address cannot find node to be replaced node after seed node restart
> ----------------------------------------------------------------------------
>                 Key: CASSANDRA-8138
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Oleg Anastasyev
>         Attachments: ReplaceAfterSeedRestart.txt
> If a node failed and a cluster was restarted (which is common case on massive outages),
replace_address fails with
> {code}
> Caused by: java.lang.RuntimeException: Cannot replace_address / because it
doesn't exist in gossip
> jvm 1    | 	at org.apache.cassandra.service.StorageService.prepareReplacementInfo(
> jvm 1    | 	at org.apache.cassandra.service.StorageService.joinTokenRing(
> jvm 1    | 	at org.apache.cassandra.service.StorageService.initServer(
> jvm 1    | 	at org.apache.cassandra.service.StorageService.initServer(
> {code}
> Although neccessary information is saved in system tables on seed nodes, it is not loaded
to gossip on seed node, so a replacement node cannot get this info.
> Attached patch loads all information from system tables to gossip with generation 0 and
fixes some bugs around this info on shadow gossip round.

This message was sent by Atlassian JIRA

View raw message