cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Logs appear to contradict themselves during bootstrap steps
Date Sat, 07 Jan 2017 01:45:52 GMT
On Fri, Jan 6, 2017 at 6:45 PM, Sotirios Delimanolis <sotodel_89@yahoo.com>
wrote:

> I forgot to check nodetool gossipinfo. Still, why does the first check
> think that the address exists, but the second doesn't?
>
>
> On Friday, January 6, 2017 1:11 PM, David Berry <dberry@blackberry.com>
> wrote:
>
>
> I’ve encountered this previously where after removing a node, gossip info
> is retained for 72 hours which doesn’t allow the IP to be reused during
> that period.   You can check how long gossip will retain this information
> using “nodetool gossipinfo” where the epoch time will be shown with status
>
> For example….
>
> Nodetool gossipinfo
>
> /10.236.70.199
>   generation:1482436691
>   heartbeat:3942407
>   STATUS:3942404:LEFT,3074457345618261000,1483995662276
>   LOAD:3942267:3.60685807E8
>   SCHEMA:223625:acbf0adb-1bbe-384a-acd7-6a46609497f1
>   DC:20:orion
>   RACK:22:r1
>   RELEASE_VERSION:4:2.1.16
>   RPC_ADDRESS:3:10.236.70.199
>   SEVERITY:3942406:0.25094103813171387
>   NET_VERSION:1:8
>   HOST_ID:2:cd2a767f-3716-4717-9106-52f0380e6184
>   TOKENS:15:<hidden>
>
> Converting it from epoch…..
>
> local@img2116saturn101:~$ date -d @$((1483995662276/1000))
> Mon Jan  9 21:01:02 UTC 2017
>
> At the time we waited the 72 hour period before reusing the IP, I’ve not
> used replace_address previously.
>
>
> *From:* Sotirios Delimanolis [mailto:sotodel_89@yahoo.com]
> *Sent:* Friday, January 6, 2017 2:38 PM
> *To:* User <user@cassandra.apache.org>
> *Subject:* Logs appear to contradict themselves during bootstrap steps
>
> We had a node go down in our cluster and its disk had to be wiped. During
> that time, all nodes in the cluster have restarted at least once.
>
> We want to add the bad node back to the ring. It has the same IP/hostname.
> I follow the steps here
> <https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_node_to_cluster_t.html>
for
> "Adding nodes to an existing cluster."
>
> When the process is started up, it reports
>
> A node with address <hostname>/<address> already exists, cancelling join.
> Use cassandra.replace_address if you want to replace this node.
>
> I found this error message in the StorageService using the Gossiper
> instance to look up the node's state. Apparently, the node knows about it.
> So I followed the instructions and added the cassandra.replace_address
> system property and restarted the process.
>
> But it reports
>
> Cannot replace_address /<address> because it doesn't exist in gossip
>
> So which one is it? Does the ring know about it or not? Running "nodetool
> ring" does show it on all other nodes.
>
> I've seen CASSANDRA-8138
> <https://issues.apache.org/jira/browse/CASSANDRA-8138> andthe conditions
> are the same, but I can't understand why it thinks it's not part of gossip.
> What's the difference between the gossip check used to make this
> determination and the gossip check used for the first error message? Can
> someone explain?
>
> I've since retrieved the node's id and used it to "nodetool removenode".
> After rebalancing, I added the node back and "nodetool cleaned" up.
> Everything's up and running, but I'd like to understand what Cassandra was
> doing.
>
>
>
>
>
>
In case you have not seen check out
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsAssassinate.html
this is what you too when you really want something to go away from gossip.

Mime
View raw message