incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Colby <jonathan.co...@gmail.com>
Subject Re: Completely removing a node from the cluster
Date Tue, 23 Aug 2011 07:45:13 GMT
I ran into this.  I also tried log_ring_state=false which also did not help.   The way I got
through this was to stop the entire cluster and start the nodes one-by-one.   

I realize this is not a practical solution for everyone, but if you can afford to stop the
cluster for a few minutes, it's worth a try.


On Aug 23, 2011, at 9:26 AM, aaron morton wrote:

> I'm running low on ideas for this one. Anyone else ? 
> 
> If the phantom node is not listed in the ring, other nodes should not be storing hints
for it. You can see what nodes they are storing hints for via JConsole. 
> 
> You can try a rolling restart passing the JVM opt -Dcassandra.load_ring_state=false However
if the phantom node is been passed around in the gossip state it will probably just come back
again. 
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 23/08/2011, at 3:49 PM, Bryce Godfrey wrote:
> 
>> Could this ghost node be causing my hints column family to grow to this size?  I
also crash after about 24 hours due to commit logs growth taking up all the drive space. 
A manual nodetool flush keeps it under control though.
>> 
>> 
>>               Column Family: HintsColumnFamily
>>               SSTable count: 6
>>               Space used (live): 666480352
>>               Space used (total): 666480352
>>               Number of Keys (estimate): 768
>>               Memtable Columns Count: 1043
>>               Memtable Data Size: 461773
>>               Memtable Switch Count: 3
>>               Read Count: 38
>>               Read Latency: 131.289 ms.
>>               Write Count: 582108
>>               Write Latency: 0.019 ms.
>>               Pending Tasks: 0
>>               Key cache capacity: 7
>>               Key cache size: 6
>>               Key cache hit rate: 0.8333333333333334
>>               Row cache: disabled
>>               Compacted row minimum size: 2816160
>>               Compacted row maximum size: 386857368
>>               Compacted row mean size: 120432714
>> 
>> Is there a way for me to manually remove this dead node?
>> 
>> -----Original Message-----
>> From: Bryce Godfrey [mailto:Bryce.Godfrey@azaleos.com] 
>> Sent: Sunday, August 21, 2011 9:09 PM
>> To: user@cassandra.apache.org
>> Subject: RE: Completely removing a node from the cluster
>> 
>> It's been at least 4 days now.
>> 
>> -----Original Message-----
>> From: aaron morton [mailto:aaron@thelastpickle.com] 
>> Sent: Sunday, August 21, 2011 3:16 PM
>> To: user@cassandra.apache.org
>> Subject: Re: Completely removing a node from the cluster
>> 
>> I see the mistake I made about ring, gets the endpoint list from the same place but
uses the token's to drive the whole process. 
>> 
>> I'm guessing here, don't have time to check all the code. But there is a 3 day timeout
in the gossip system. Not sure if it applies in this case. 
>> 
>> Anyone know ?
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote:
>> 
>>> Both .2 and .3 list the same from the mbean that Unreachable is empty collection,
and Live node lists all 3 nodes still:
>>> 192.168.20.2
>>> 192.168.20.3
>>> 192.168.20.1
>>> 
>>> The removetoken was done a few days ago, and I believe the remove was done from
.2
>>> 
>>> Here is what ring outlook looks like, not sure why I get that token on the empty
first line either:
>>> Address         DC          Rack        Status State   Load            Owns 
  Token
>>>                                                                             85070591730234615865843651857942052864
>>> 192.168.20.2    datacenter1 rack1       Up     Normal  79.53 GB       50.00%
 0
>>> 192.168.20.3    datacenter1 rack1       Up     Normal  42.63 GB       50.00%
 85070591730234615865843651857942052864
>>> 
>>> Yes, both nodes show the same thing when doing a describe cluster, that .1 is
unreachable.
>>> 
>>> 
>>> -----Original Message-----
>>> From: aaron morton [mailto:aaron@thelastpickle.com] 
>>> Sent: Sunday, August 21, 2011 4:23 AM
>>> To: user@cassandra.apache.org
>>> Subject: Re: Completely removing a node from the cluster
>>> 
>>> Unreachable nodes in either did not respond to the message or were known to be
down and were not sent a message. 
>>> The way the node lists are obtained for the ring command and describe cluster
are the same. So it's a bit odd. 
>>> 
>>> Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean ?
What do the LiveNode and UnrechableNodes attributes say ? 
>>> 
>>> Also how long ago did you remove the token and on which machine? Do both 20.2
and 20.3 think 20.1 is still around ? 
>>> 
>>> Cheers
>>> 
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote:
>>> 
>>>> I'm on 0.8.4
>>>> 
>>>> I have removed a dead node from the cluster using nodetool removetoken command,
and moved one of the remaining nodes to rebalance the tokens.  Everything looks fine when
I run nodetool ring now, as it only lists the remaining 2 nodes and they both look fine, owning
50% of the tokens.
>>>> 
>>>> However, I can still see it being considered as part of the cluster from
the Cassandra-cli (192.168.20.1 being the removed node) and I'm worried that the cluster is
still queuing up hints for the node, or any other issues it may cause:
>>>> 
>>>> Cluster Information:
>>>> Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>> Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>> Schema versions:
>>>>     dcc8f680-caa4-11e0-0000-553d4dced3ff: [192.168.20.2, 192.168.20.3]
>>>>     UNREACHABLE: [192.168.20.1]
>>>> 
>>>> 
>>>> Do I need to do something else to completely remove this node?
>>>> 
>>>> Thanks,
>>>> Bryce
>>> 
>> 
> 


Mime
View raw message