incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryce Godfrey <Bryce.Godf...@azaleos.com>
Subject RE: Completely removing a node from the cluster
Date Mon, 22 Aug 2011 04:08:38 GMT
It's been at least 4 days now.

-----Original Message-----
From: aaron morton [mailto:aaron@thelastpickle.com] 
Sent: Sunday, August 21, 2011 3:16 PM
To: user@cassandra.apache.org
Subject: Re: Completely removing a node from the cluster

I see the mistake I made about ring, gets the endpoint list from the same place but uses the
token's to drive the whole process. 

I'm guessing here, don't have time to check all the code. But there is a 3 day timeout in
the gossip system. Not sure if it applies in this case. 

Anyone know ?

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote:

> Both .2 and .3 list the same from the mbean that Unreachable is empty collection, and
Live node lists all 3 nodes still:
> 192.168.20.2
> 192.168.20.3
> 192.168.20.1
> 
> The removetoken was done a few days ago, and I believe the remove was done from .2
> 
> Here is what ring outlook looks like, not sure why I get that token on the empty first
line either:
> Address         DC          Rack        Status State   Load            Owns    Token
>                                                                               85070591730234615865843651857942052864
> 192.168.20.2    datacenter1 rack1       Up     Normal  79.53 GB       50.00%  0
> 192.168.20.3    datacenter1 rack1       Up     Normal  42.63 GB       50.00%  85070591730234615865843651857942052864
> 
> Yes, both nodes show the same thing when doing a describe cluster, that .1 is unreachable.
> 
> 
> -----Original Message-----
> From: aaron morton [mailto:aaron@thelastpickle.com] 
> Sent: Sunday, August 21, 2011 4:23 AM
> To: user@cassandra.apache.org
> Subject: Re: Completely removing a node from the cluster
> 
> Unreachable nodes in either did not respond to the message or were known to be down and
were not sent a message. 
> The way the node lists are obtained for the ring command and describe cluster are the
same. So it's a bit odd. 
> 
> Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean ? What do
the LiveNode and UnrechableNodes attributes say ? 
> 
> Also how long ago did you remove the token and on which machine? Do both 20.2 and 20.3
think 20.1 is still around ? 
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote:
> 
>> I'm on 0.8.4
>> 
>> I have removed a dead node from the cluster using nodetool removetoken command, and
moved one of the remaining nodes to rebalance the tokens.  Everything looks fine when I run
nodetool ring now, as it only lists the remaining 2 nodes and they both look fine, owning
50% of the tokens.
>> 
>> However, I can still see it being considered as part of the cluster from the Cassandra-cli
(192.168.20.1 being the removed node) and I'm worried that the cluster is still queuing up
hints for the node, or any other issues it may cause:
>> 
>> Cluster Information:
>>  Snitch: org.apache.cassandra.locator.SimpleSnitch
>>  Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>  Schema versions:
>>       dcc8f680-caa4-11e0-0000-553d4dced3ff: [192.168.20.2, 192.168.20.3]
>>       UNREACHABLE: [192.168.20.1]
>> 
>> 
>> Do I need to do something else to completely remove this node?
>> 
>> Thanks,
>> Bryce
> 


Mime
View raw message