incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Unreachable node, not in nodetool ring
Date Fri, 27 Jul 2012 09:04:49 GMT
Hi again,

Nobody has a clue about this issue ?

I'm still facing this problem.

Alain

2012/7/23 Alain RODRIGUEZ <arodrime@gmail.com>:
> Does anyone knows how to totally remove a dead node that only appears
> when doing a "describe cluster" from the cli ?
>
> I still got this issue in my production cluster.
>
> Alain
>
> 2012/7/20 Alain RODRIGUEZ <arodrime@gmail.com>:
>> Hi Aaron,
>>
>> I have repaired and cleanup both nodes already and I did it after any
>> change on my ring (It tooks me a while btw :)).
>>
>> The node *.211 is actually out of the ring and out of my control
>> 'cause I don't have the server anymore (EC2 instance terminated a few
>> days ago).
>>
>> Alain
>>
>> 2012/7/20 aaron morton <aaron@thelastpickle.com>:
>>> I would:
>>>
>>> * run repair on 10.58.83.109
>>> * run cleanup on 10.59.21.241 (I assume this was the first node).
>>>
>>> It looks like 0.56.62.211 is out of the cluster.
>>>
>>> Cheers
>>>
>>> -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote:
>>>
>>> Not sure if this may help :
>>>
>>> nodetool -h localhost gossipinfo
>>> /10.58.83.109
>>>  RELEASE_VERSION:1.1.2
>>>  RACK:1b
>>>  LOAD:5.9384978406E10
>>>  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
>>>  DC:eu-west
>>>  STATUS:NORMAL,85070591730234615865843651857942052864
>>>  RPC_ADDRESS:0.0.0.0
>>> /10.248.10.94
>>>  RELEASE_VERSION:1.1.2
>>>  LOAD:3.0128207422E10
>>>  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
>>>  STATUS:LEFT,0,1342866804032
>>>  RPC_ADDRESS:0.0.0.0
>>> /10.56.62.211
>>>  RELEASE_VERSION:1.1.2
>>>  LOAD:11594.0
>>>  RACK:1b
>>>  SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
>>>  DC:eu-west
>>>  REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864
>>>  STATUS:removed,170141183460469231731687303715884105727,1342453967415
>>>  RPC_ADDRESS:0.0.0.0
>>> /10.59.21.241
>>>  RELEASE_VERSION:1.1.2
>>>  RACK:1b
>>>  LOAD:1.08667047094E11
>>>  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
>>>  DC:eu-west
>>>  STATUS:NORMAL,0
>>>  RPC_ADDRESS:0.0.0.0
>>>
>>> Story :
>>>
>>> I had 2 node cluster
>>>
>>> 10.248.10.94 Token 0
>>> 10.59.21.241 Token 85070591730234615865843651857942052864
>>>
>>> Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1
>>> (170141183460469231731687303715884105727). This failed, I removed
>>> token.
>>>
>>> I repeat the previous operation with the node 10.59.21.241 and it went
>>> fine. Next I decommissionned the node 10.248.10.94 and moved
>>> 10.59.21.241 to the token 0.
>>>
>>> Now I am on the situation described before.
>>>
>>> Alain
>>>
>>>
>>> 2012/7/19 Alain RODRIGUEZ <arodrime@gmail.com>:
>>>
>>> Hi, I wasn't able to see the token used currently by the 10.56.62.211
>>>
>>> (ghost node).
>>>
>>>
>>> I already removed the token 6 days ago :
>>>
>>>
>>> -> "Removing token 170141183460469231731687303715884105727 for
>>> /10.56.62.211"
>>>
>>>
>>> "- check in cassandra log. It is possible you see a log line telling
>>>
>>> you 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same
>>>
>>> token"
>>>
>>>
>>> Nothing like that in the logs
>>>
>>>
>>> I tried the following without success :
>>>
>>>
>>> $ nodetool -h localhost removetoken 170141183460469231731687303715884105727
>>>
>>> Exception in thread "main" java.lang.UnsupportedOperationException:
>>>
>>> Token not found.
>>>
>>> ...
>>>
>>>
>>> I really thought this was going to work :-).
>>>
>>>
>>> Any other ideas ?
>>>
>>>
>>> Alain
>>>
>>>
>>> PS : I heard that Octo is a nice company and you use Cassandra so I
>>>
>>> guess you're fine in there :-). I wish you the best thanks for your
>>>
>>> help.
>>>
>>>
>>> 2012/7/19 Olivier Mallassi <omallassi@octo.com>:
>>>
>>> I got that a couple of time (due to DNS issues in our infra)
>>>
>>>
>>> what you could try
>>>
>>> - check in cassandra log. It is possible you see a log line telling you
>>>
>>> 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same token
>>>
>>> - if 10.56.62.211 is up, try decommission (via nodetool)
>>>
>>> - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1
>>>
>>> - use removetoken (via nodetool) to remove the token associated with
>>>
>>> 10.56.62.211. in case of failure, you can use removetoken -f instead.
>>>
>>>
>>> then, the unreachable IP should have disappeared.
>>>
>>>
>>>
>>> HTH
>>>
>>>
>>> On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ <arodrime@gmail.com>
>>>
>>> wrote:
>>>
>>>
>>> Hi,
>>>
>>>
>>> I tried to add a node a few days ago and it failed. I finally made it
>>>
>>> work with an other node but now when I describe cluster on cli I got
>>>
>>> this :
>>>
>>>
>>> Cluster Information:
>>>
>>>   Snitch: org.apache.cassandra.locator.Ec2Snitch
>>>
>>>   Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>
>>>   Schema versions:
>>>
>>>      UNREACHABLE: [10.56.62.211]
>>>
>>>      e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109]
>>>
>>>
>>> And nodetool ring gives me :
>>>
>>>
>>> Address         DC          Rack        Status State   Load
>>>
>>> Owns                Token
>>>
>>>
>>>                    85070591730234615865843651857942052864
>>>
>>> 10.59.21.241    eu-west     1b          Up     Normal  101.17 GB
>>>
>>> 50.00%              0
>>>
>>> 10.58.83.109    eu-west     1b          Up     Normal  55.27 GB
>>>
>>> 50.00%              85070591730234615865843651857942052864
>>>
>>>
>>> The point, as you can see, is that one of my node has twice the
>>>
>>> information of the second one. I have a RF = 2 defined.
>>>
>>>
>>> My guess is that the token 0 node keep data for the unreachable node.
>>>
>>>
>>> The IP of the unreachable node doesn't belong to me anymore, I have no
>>>
>>> access to this ghost node.
>>>
>>>
>>> Does someone know how to completely remove this ghost node from my cluster
>>>
>>> ?
>>>
>>>
>>> Thank you.
>>>
>>>
>>> Alain
>>>
>>>
>>> INFO :
>>>
>>>
>>> On ubuntu (AMI Datastax 2.1 and 2.2)
>>>
>>> Cassandra 1.1.2 (upgraded from 1.0.9)
>>>
>>> 2 node cluster (+ the ghost one)
>>>
>>> RF = 2
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> ............................................................
>>>
>>> Olivier Mallassi
>>>
>>> OCTO Technology
>>>
>>> ............................................................
>>>
>>> 50, Avenue des Champs-Elysées
>>>
>>> 75008 Paris
>>>
>>>
>>> Mobile: (33) 6 28 70 26 61
>>>
>>> Tél: (33) 1 58 56 10 00
>>>
>>> Fax: (33) 1 58 56 10 01
>>>
>>>
>>> http://www.octo.com
>>>
>>> Octo Talks! http://blog.octo.com
>>>
>>>
>>>
>>>

Mime
View raw message