incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jonathan.co...@gmail.com
Subject Re: Re: nodetool move trying to stream data to node no longer in cluster
Date Thu, 26 May 2011 22:48:28 GMT
Hi Aaron - Thanks alot for the great feedback. I'll try your suggestion on  
removing it as an endpoint with jmx.

On , aaron morton <aaron@thelastpickle.com> wrote:
> Off the top of my head the simple way to stop invalid end point state  
> been passed around is a full cluster stop. Obviously thats not an option.  
> The problem is if one node has the IP is will share it around with the  
> others.



> Out of interest take a look at the oacdb.FailureDetector MBean  
> getAllEndpointStates() function. That returns the end point state held by  
> the Gossiper. I think you should see the Phantom IP listed in there.



> If it's only on some nodes *perhaps* restarting the node with the JVM  
> option -Dcassandra.load_ring_state=false *may* help. That will stop the  
> node from loading it's save ring state and force it to get it via gossip.  
> Again, if there are other nodes with the phantom IP it may just get it  
> again.



> I'll do some digging and try to get back to you. This pops up from time  
> to time and thinking out loud I wonder if it would be possible to add a  
> new application state that purges an IP from the ring. eg  
> VersionedValue.STATUS_PURGED that works with a ttl so it goes through X  
> number of gossip rounds and then disappears.



> Hope that helps.





> -----------------

> Aaron Morton

> Freelance Cassandra Developer

> @aaronmorton

> http://www.thelastpickle.com



> On 26 May 2011, at 19:58, Jonathan Colby wrote:



> > @Aaron -

> >

> > Unfortunately I'm still seeing message like: " is down", removing from  
> gossip, although with not the same frequency.

> >

> > And repair/move jobs don't seem to try to stream data to the removed  
> node anymore.

> >

> > Anyone know how to totally purge any stored gossip/endpoint data on  
> nodes that were removed from the cluster. Or what might be happening here  
> otherwise?

> >

> >

> > On May 26, 2011, at 9:10 AM, aaron morton wrote:

> >

> >> cool. I was going to suggest that but as you already had the move  
> running I thought it may be a little drastic.

> >>

> >> Did it show any progress ? If the IP address is not responding there  
> should have been some sort of error.

> >>

> >> Cheers

> >>

> >> -----------------

> >> Aaron Morton

> >> Freelance Cassandra Developer

> >> @aaronmorton

> >> http://www.thelastpickle.com

> >>

> >> On 26 May 2011, at 15:28, jonathan.colby@gmail.com wrote:

> >>

> >>> Seems like it had something to do with stale endpoint information. I  
> did a rolling restart of the whole cluster and that seemed to trigger the  
> nodes to remove the node that was decommissioned.

> >>>

> >>> On , aaron morton aaron@thelastpickle.com> wrote:

> >>>> Is it showing progress ? It may just be a problem with the  
> information printed out.

> >>>>

> >>>>

> >>>>

> >>>> Can you check from the other nodes in the cluster to see if they are
 
> receiving the stream ?

> >>>>

> >>>>

> >>>>

> >>>> cheers

> >>>>

> >>>>

> >>>>

> >>>> -----------------

> >>>>

> >>>> Aaron Morton

> >>>>

> >>>> Freelance Cassandra Developer

> >>>>

> >>>> @aaronmorton

> >>>>

> >>>> http://www.thelastpickle.com

> >>>>

> >>>>

> >>>>

> >>>> On 26 May 2011, at 00:42, Jonathan Colby wrote:

> >>>>

> >>>>

> >>>>

> >>>>> I recently removed a node (with decommission) from our cluster.

> >>>>

> >>>>>

> >>>>

> >>>>> I added a couple new nodes and am now trying to rebalance the  
> cluster using nodetool move.

> >>>>

> >>>>>

> >>>>

> >>>>> However, netstats shows that the node being "moved" is trying to
 
> stream data to the node that I already decommissioned yesterday.

> >>>>

> >>>>>

> >>>>

> >>>>> The removed node was powered-off, taken out of dns, its IP is not
 
> even pingable. It was never a seed neither.

> >>>>

> >>>>>

> >>>>

> >>>>> This is cassandra 0.7.5 on 64bit linux. How do I tell the cluster
 
> that this node is gone? Gossip should have detected this. The ring  
> commands shows the correct cluster IPs.

> >>>>

> >>>>>

> >>>>

> >>>>> Here is a portion of netstats. 10.46.108.102 is the node which was
 
> removed.

> >>>>

> >>>>>

> >>>>

> >>>>> Mode: Leaving: streaming data to other nodes

> >>>>

> >>>>> Streaming to: /10.46.108.102

> >>>>

> >>>>>  
> /var/lib/cassandra/data/DFS/main-f-1064-Data.db/(4681027,5195491),(5195491,15308570),(15308570,15891710),(16336750,20558705),(20558705,29112203),(29112203,36279329),(36465942,36623223),(36740457,37227058),(37227058,42206994),(42206994,47380294),(47635053,47709813),(47709813,48353944),(48621287,49406499),(53330048,53571312),(53571312,54153922),(54153922,59857615),(59857615,61029910),(61029910,61871509),(62190800,62498605),(62824281,62964830),(63511604,64353114),(64353114,64760400),(65174702,65919771),(65919771,66435630),(81440029,81725949),(81725949,83313847),(83313847,83908709),(88983863,89237303),(89237303,89934199),(89934199,97

> >>>>

> >>>>> ...................

> >>>>

> >>>>>  
> 5693491,14795861666),(14795861666,14796105318),(14796105318,14796366886),(14796699825,14803874941),(14803874941,14808898331),(14808898331,14811670699),(14811670699,14815125177),(14815125177,14819765003),(14820229433,14820858266)

> >>>>

> >>>>> progress=280574376402/12434049900 - 2256%

> >>>>

> >>>>> .....

> >>>>

> >>>>>

> >>>>

> >>>>>

> >>>>

> >>>>> Note 10.46.108.102 is NOT part of the ring.

> >>>>

> >>>>>

> >>>>

> >>>>> Address Status State Load Owns Token

> >>>>

> >>>>> 148873535527910577765226390751398592512

> >>>>

> >>>>> 10.46.108.100 Up Normal 71.73 GB 12.50% 0

> >>>>

> >>>>> 10.46.108.101 Up Normal 109.69 GB 12.50%  
> 21267647932558653966460912964485513216

> >>>>

> >>>>> 10.47.108.100 Up Leaving 281.95 GB 37.50%  
> 85070591730234615865843651857942052863

> >>>>> 10.47.108.102 Up Normal 210.77 GB 0.00%  
> 85070591730234615865843651857942052864

> >>>>

> >>>>> 10.47.108.101 Up Normal 289.59 GB 16.67%  
> 113427455640312821154458202477256070484

> >>>>

> >>>>> 10.46.108.103 Up Normal 299.87 GB 8.33%  
> 127605887595351923798765477786913079296

> >>>>

> >>>>> 10.47.108.103 Up Normal 94.99 GB 12.50%  
> 148873535527910577765226390751398592511

> >>>>

> >>>>> 10.46.108.104 Up Normal 103.01 GB 0.00%  
> 148873535527910577765226390751398592512

> >>>>

> >>>>>

> >>>>

> >>>>>

> >>>>

> >>>>>

> >>>>

> >>>>

> >>>>

> >>

> >



Mime
View raw message