cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-6615) Changing the IP of a node on a live cluster leaves gossip infos and throws Exceptions
Date Wed, 29 Jan 2014 03:54:09 GMT


Brandon Williams updated CASSANDRA-6615:

    Attachment: 6615.txt

Host ID conflicts are roughly as important as token conflicts, and need to be handled the
same way, decisively.  We can decide who has won a host id conflict, much like we do a token
conflict.  Once the loser is removed from tMD we can just let the FD mark it dead and then
it will be evicted as a fat client (which is how it worked before we added host IDs.)  However,
post-4375 this will take quite a while, since the only sample the FD has is the seed value
of 30s.  While this is actually ok as long as we've removed it from tMD, we can do better,
so we call removeEndpoint, which in turn removes it from the FD, but doesn't mark the epstate
as dead.  isFatClient began checking if the epstate was dead in CASSANDRA-5378, but this doesn't
seem necessary since the timestamp is updated if the node is actually alive, and the duration
check will prevent it from being expired, so this patch removes it.

One small bit of nuance here is that if the host IDs conflict and the loser is in tMD, then
the token conflict check is basically useless, since we have to update the host ID before
the tokens, and the token check relies on data in tMD.  This means if a host ID conflict occurs
where the tokens are different, the loser's tokens may just vanish, but that's highly unlikely
to occur without hand-editing the system table or crafting one specifically for this.

> Changing the IP of a node on a live cluster leaves gossip infos and throws Exceptions
> -------------------------------------------------------------------------------------
>                 Key: CASSANDRA-6615
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Fabien Rousseau
>            Assignee: Brandon Williams
>             Fix For: 1.2.14
>         Attachments: 6615.txt
> Following this procedure :
 to change the IP of a node, we encountered an issue :
>  - logs contains: "java.lang.RuntimeException: Host ID collision between active endpoint
/ and /"
>  - logs also indicate that the old IP is being removed of the cluster (FatClient timeout),
then added again...
>  - nodetool gossipinfo still list old IP (even a few hours after...)
>  - the old IP is still seen as "UP" in the cluster... (according to the logs...)
> Below is a small shell script which allows to reproduce the scenario...
> {noformat}
> #! /bin/bash
> ccm create $CLUSTER --cassandra-dir=.
> ccm populate -n 2
> ccm start
> ccm add node3 -i -j 7300 -b
> ccm node3 start
> ccm node3 ring
> ccm node3 stop
> sed -i 's/' ~/.ccm/$CLUSTER/node3/node.conf 
> sed -i 's/' ~/.ccm/$CLUSTER/node3/conf/cassandra.yaml
> ccm node3 start
> sleep 3
> nodetool --host --port 7300 gossipinfo
> {noformat}

This message was sent by Atlassian JIRA

View raw message