incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Unreachable Nodes
Date Wed, 22 May 2013 09:54:52 GMT
Hi.

I think that the "unsafeAssassinateEndpoint" was the good solution here. I
was going to lead you to this solution after reading the first part of your
message.

"Does anyone know why the dead nodes still appear when we run "nodetool
gossipinfo" but they don't when we run "describe cluster" from the CLI?"

That's a good thing. Gossiper just keep this information for a while (7 or
10 days by default off the top off my head), but this doesn't harm your
cluster in any ways, but having "UNREACHABLE" nodes could have been
annoying. By the way gossipinfo shows you those nodes as "STATUS:LEFT"
which is good. I am quite sure that this status changed when you used the
jmx "unsafeAssassinateEndpoint".

"do a full cluster restart (I presume that means a rolling restart - not
shut-down the entire cluster right???). "

A full restart => entire cluster down => down time. It is precisely *not* a
rolling restart.

To conclude I would say that your cluster seems healthy now (from what I
can see), you have no more ghost nodes and nothing to do. Just wait a week
or so and look for gossipinfo again.


2013/5/22 Vasileios Vlachos <vasileiosvlachos@gmail.com>

> Hello All,
>
> A while ago we had 3 cassandra nodes on Amazon. At some point we decided
> to buy some servers and deploy cassandra there. The problem is that since
> then we have a list of dead IPs listed as UNREACHABLE nodes when we run
> describe cluster on cassandra-cli.
>
> I have seen other posts which describe similar issues, and the bottom line
> is "it's harmless but if you want to get rid of it do a full cluster
> restart" (I presume that means a rolling restart - not shut-down the entire
> cluster right???). Anyway...
>
> We also came across another solution: Install "libmx4j-java", uncomment
> the respective line on "/etc/default/cassandra", restart the node, go to "
> http://cassandra_node:8081/mbean?objectname=org.apache.cassandra.net%3Atype%3DGossiper",
> type in the dead IP/IPs next to the "unsafeAssassinateEndpoint" and invoke
> it. So we did that on one of the nodes for the list of dead IPs. After
> running "describe cluster" on the CLI on every node, we noticed that there
> were no UNREACHABLE nodes and everything looked OK.
>
> However, when we run "nodetool gossipinfo" we get the following output:
>
> /10.1.32.97
>  RELEASE_VERSION:1.0.11
> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
> LOAD:2.76851457173E11
> RPC_ADDRESS:0.0.0.0
> STATUS:NORMAL,56713727820156410577229101238628035243
> /10.128.16.111
> REMOVAL_COORDINATOR:REMOVER,113427455640312821154458202477256070486
> STATUS:LEFT,42537039300520238181471502256297362072,1369471488145
> /10.128.16.110
> REMOVAL_COORDINATOR:REMOVER,1
> STATUS:LEFT,42537092606577173116506557155915918934,1369471275829
> /10.1.32.100
> RELEASE_VERSION:1.0.11
> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
> LOAD:2.75649392881E11
> RPC_ADDRESS:0.0.0.0
> STATUS:NORMAL,85070591730234615865843651857942052863
> /10.1.32.101
> RELEASE_VERSION:1.0.11
> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
> LOAD:2.71158702006E11
> RPC_ADDRESS:0.0.0.0
> STATUS:NORMAL,141784319550391026443072753096570088105
> /10.1.32.98
> RELEASE_VERSION:1.0.11
> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
> LOAD:2.73163150773E11
> RPC_ADDRESS:0.0.0.0
> STATUS:NORMAL,113427455640312821154458202477256070486
> /10.128.16.112
> REMOVAL_COORDINATOR:REMOVER,1
> STATUS:LEFT,42537092606577173116506557155915918934,1369471567719
> /10.1.32.99
> RELEASE_VERSION:1.0.11
> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
> LOAD:2.72271268395E11
> RPC_ADDRESS:0.0.0.0
> STATUS:NORMAL,28356863910078205288614550619314017621
> /10.1.32.96
> RELEASE_VERSION:1.0.11
> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
> LOAD:2.71494331357E11
> RPC_ADDRESS:0.0.0.0
> STATUS:NORMAL,0
>
> Does anyone know why the dead nodes still appear when we run "nodetool
> gossipinfo" but they don't when we run "describe cluster" from the CLI?
>
> Thank you in advance for your help,
>
> Vasilis
>

Mime
View raw message