cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: nodetool move trying to stream data to node no longer in cluster
Date Thu, 26 May 2011 21:24:12 GMT
Off the top of my head the simple way to stop invalid end point state been passed around is
a full cluster stop. Obviously thats not an option. The problem is if one node has the IP
is will share it around with the others.  

Out of interest take a look at the o.a.c.db.FailureDetector MBean getAllEndpointStates() function.
That returns the end point state held by the Gossiper. I think you should see the Phantom
IP listed in there. 

If it's only on some nodes *perhaps* restarting the node with the JVM option -Dcassandra.load_ring_state=false
*may* help. That will stop the node from loading it's save ring state and force it to get
it via gossip. Again, if there are other nodes with the phantom IP it may just get it again.


I'll do some digging and try to get back to you. This pops up from time to time and thinking
out loud I wonder if it would be possible to add a new application state that purges an IP
from the ring. e.g. VersionedValue.STATUS_PURGED that works with a ttl so it goes through
X number of gossip rounds and then disappears.  

Hope that helps. 

   
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 26 May 2011, at 19:58, Jonathan Colby wrote:

> @Aaron -
> 
> Unfortunately I'm still seeing message like:  "<ip-of-removed-node> is down", removing
from gossip, although with not the same frequency.  
> 
> And repair/move jobs don't seem to try to stream data to the removed node anymore.
> 
> Anyone know how to totally purge any stored gossip/endpoint data on nodes that were removed
from the cluster.  Or what might be happening here otherwise?
> 
> 
> On May 26, 2011, at 9:10 AM, aaron morton wrote:
> 
>> cool. I was going to suggest that but as you already had the move running I thought
it may be a little drastic. 
>> 
>> Did it show any progress ? If the IP address is not responding there should have
been some sort of error. 
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 26 May 2011, at 15:28, jonathan.colby@gmail.com wrote:
>> 
>>> Seems like it had something to do with stale endpoint information. I did a rolling
restart of the whole cluster and that seemed to trigger the nodes to remove the node that
was decommissioned.
>>> 
>>> On , aaron morton <aaron@thelastpickle.com> wrote:
>>>> Is it showing progress ? It may just be a problem with the information printed
out.
>>>> 
>>>> 
>>>> 
>>>> Can you check from the other nodes in the cluster to see if they are receiving
the stream ?
>>>> 
>>>> 
>>>> 
>>>> cheers
>>>> 
>>>> 
>>>> 
>>>> -----------------
>>>> 
>>>> Aaron Morton
>>>> 
>>>> Freelance Cassandra Developer
>>>> 
>>>> @aaronmorton
>>>> 
>>>> http://www.thelastpickle.com
>>>> 
>>>> 
>>>> 
>>>> On 26 May 2011, at 00:42, Jonathan Colby wrote:
>>>> 
>>>> 
>>>> 
>>>>> I recently removed a node (with decommission) from our cluster.
>>>> 
>>>>> 
>>>> 
>>>>> I added a couple new nodes and am now trying to rebalance the cluster
using nodetool move.
>>>> 
>>>>> 
>>>> 
>>>>> However,  netstats shows that the node being "moved" is trying to stream
data to the node that I already decommissioned yesterday.
>>>> 
>>>>> 
>>>> 
>>>>> The removed node was powered-off, taken out of dns, its IP is not even
pingable.   It was never a seed neither.
>>>> 
>>>>> 
>>>> 
>>>>> This is cassandra 0.7.5 on 64bit linux.   How do I tell the cluster that
this node is gone?  Gossip should have detected this.  The ring commands shows the correct
cluster IPs.
>>>> 
>>>>> 
>>>> 
>>>>> Here is a portion of netstats. 10.46.108.102 is the node which was removed.
>>>> 
>>>>> 
>>>> 
>>>>> Mode: Leaving: streaming data to other nodes
>>>> 
>>>>> Streaming to: /10.46.108.102
>>>> 
>>>>> /var/lib/cassandra/data/DFS/main-f-1064-Data.db/(4681027,5195491),(5195491,15308570),(15308570,15891710),(16336750,20558705),(20558705,29112203),(29112203,36279329),(36465942,36623223),(36740457,37227058),(37227058,42206994),(42206994,47380294),(47635053,47709813),(47709813,48353944),(48621287,49406499),(53330048,53571312),(53571312,54153922),(54153922,59857615),(59857615,61029910),(61029910,61871509),(62190800,62498605),(62824281,62964830),(63511604,64353114),(64353114,64760400),(65174702,65919771),(65919771,66435630),(81440029,81725949),(81725949,83313847),(83313847,83908709),(88983863,89237303),(89237303,89934199),(89934199,97
>>>> 
>>>>> ...................
>>>> 
>>>>> 5693491,14795861666),(14795861666,14796105318),(14796105318,14796366886),(14796699825,14803874941),(14803874941,14808898331),(14808898331,14811670699),(14811670699,14815125177),(14815125177,14819765003),(14820229433,14820858266)
>>>> 
>>>>>       progress=280574376402/12434049900 - 2256%
>>>> 
>>>>> .....
>>>> 
>>>>> 
>>>> 
>>>>> 
>>>> 
>>>>> Note 10.46.108.102 is NOT part of the ring.
>>>> 
>>>>> 
>>>> 
>>>>> Address         Status State   Load            Owns    Token
>>>> 
>>>>>                                                     148873535527910577765226390751398592512
>>>> 
>>>>> 10.46.108.100   Up     Normal  71.73 GB        12.50%  0
>>>> 
>>>>> 10.46.108.101   Up     Normal  109.69 GB       12.50%  21267647932558653966460912964485513216
>>>> 
>>>>> 10.47.108.100   Up     Leaving 281.95 GB       37.50%  85070591730234615865843651857942052863
      
>>>>> 10.47.108.102   Up     Normal  210.77 GB       0.00%   85070591730234615865843651857942052864
>>>> 
>>>>> 10.47.108.101   Up     Normal  289.59 GB       16.67%  113427455640312821154458202477256070484
>>>> 
>>>>> 10.46.108.103   Up     Normal  299.87 GB       8.33%   127605887595351923798765477786913079296
>>>> 
>>>>> 10.47.108.103   Up     Normal  94.99 GB        12.50%  148873535527910577765226390751398592511
>>>> 
>>>>> 10.46.108.104   Up     Normal  103.01 GB       0.00%   148873535527910577765226390751398592512
>>>> 
>>>>> 
>>>> 
>>>>> 
>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>> 
> 


Mime
View raw message