incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasileios Vlachos <vasileiosvlac...@gmail.com>
Subject Re: Unreachable Nodes
Date Wed, 22 May 2013 10:46:15 GMT
Hello,

Thanks for your fast response. That makes sense. I'll just keep an eye on
it then.

Many thanks,

Vasilis


On Wed, May 22, 2013 at 10:54 AM, Alain RODRIGUEZ <arodrime@gmail.com>wrote:

> Hi.
>
> I think that the "unsafeAssassinateEndpoint" was the good solution here. I
> was going to lead you to this solution after reading the first part of your
> message.
>
> "Does anyone know why the dead nodes still appear when we run "nodetool
> gossipinfo" but they don't when we run "describe cluster" from the CLI?"
>
> That's a good thing. Gossiper just keep this information for a while (7 or
> 10 days by default off the top off my head), but this doesn't harm your
> cluster in any ways, but having "UNREACHABLE" nodes could have been
> annoying. By the way gossipinfo shows you those nodes as "STATUS:LEFT"
> which is good. I am quite sure that this status changed when you used the
> jmx "unsafeAssassinateEndpoint".
>
> "do a full cluster restart (I presume that means a rolling restart - not
> shut-down the entire cluster right???). "
>
> A full restart => entire cluster down => down time. It is precisely *not*
> a rolling restart.
>
> To conclude I would say that your cluster seems healthy now (from what I
> can see), you have no more ghost nodes and nothing to do. Just wait a week
> or so and look for gossipinfo again.
>
>
> 2013/5/22 Vasileios Vlachos <vasileiosvlachos@gmail.com>
>
>> Hello All,
>>
>> A while ago we had 3 cassandra nodes on Amazon. At some point we decided
>> to buy some servers and deploy cassandra there. The problem is that since
>> then we have a list of dead IPs listed as UNREACHABLE nodes when we run
>> describe cluster on cassandra-cli.
>>
>> I have seen other posts which describe similar issues, and the bottom
>> line is "it's harmless but if you want to get rid of it do a full cluster
>> restart" (I presume that means a rolling restart - not shut-down the entire
>> cluster right???). Anyway...
>>
>> We also came across another solution: Install "libmx4j-java", uncomment
>> the respective line on "/etc/default/cassandra", restart the node, go to "
>> http://cassandra_node:8081/mbean?objectname=org.apache.cassandra.net%3Atype%3DGossiper",
>> type in the dead IP/IPs next to the "unsafeAssassinateEndpoint" and invoke
>> it. So we did that on one of the nodes for the list of dead IPs. After
>> running "describe cluster" on the CLI on every node, we noticed that there
>> were no UNREACHABLE nodes and everything looked OK.
>>
>> However, when we run "nodetool gossipinfo" we get the following output:
>>
>> /10.1.32.97
>>  RELEASE_VERSION:1.0.11
>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
>> LOAD:2.76851457173E11
>> RPC_ADDRESS:0.0.0.0
>> STATUS:NORMAL,56713727820156410577229101238628035243
>> /10.128.16.111
>> REMOVAL_COORDINATOR:REMOVER,113427455640312821154458202477256070486
>> STATUS:LEFT,42537039300520238181471502256297362072,1369471488145
>> /10.128.16.110
>> REMOVAL_COORDINATOR:REMOVER,1
>> STATUS:LEFT,42537092606577173116506557155915918934,1369471275829
>> /10.1.32.100
>> RELEASE_VERSION:1.0.11
>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
>> LOAD:2.75649392881E11
>> RPC_ADDRESS:0.0.0.0
>> STATUS:NORMAL,85070591730234615865843651857942052863
>> /10.1.32.101
>> RELEASE_VERSION:1.0.11
>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
>> LOAD:2.71158702006E11
>> RPC_ADDRESS:0.0.0.0
>> STATUS:NORMAL,141784319550391026443072753096570088105
>> /10.1.32.98
>> RELEASE_VERSION:1.0.11
>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
>> LOAD:2.73163150773E11
>> RPC_ADDRESS:0.0.0.0
>> STATUS:NORMAL,113427455640312821154458202477256070486
>> /10.128.16.112
>> REMOVAL_COORDINATOR:REMOVER,1
>> STATUS:LEFT,42537092606577173116506557155915918934,1369471567719
>> /10.1.32.99
>> RELEASE_VERSION:1.0.11
>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
>> LOAD:2.72271268395E11
>> RPC_ADDRESS:0.0.0.0
>> STATUS:NORMAL,28356863910078205288614550619314017621
>> /10.1.32.96
>> RELEASE_VERSION:1.0.11
>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
>> LOAD:2.71494331357E11
>> RPC_ADDRESS:0.0.0.0
>> STATUS:NORMAL,0
>>
>> Does anyone know why the dead nodes still appear when we run "nodetool
>> gossipinfo" but they don't when we run "describe cluster" from the CLI?
>>
>> Thank you in advance for your help,
>>
>> Vasilis
>>
>
>

Mime
View raw message