incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Unreachable node, not in nodetool ring
Date Fri, 20 Jul 2012 09:30:52 GMT
Hi Aaron,

I have repaired and cleanup both nodes already and I did it after any
change on my ring (It tooks me a while btw :)).

The node *.211 is actually out of the ring and out of my control
'cause I don't have the server anymore (EC2 instance terminated a few
days ago).

Alain

2012/7/20 aaron morton <aaron@thelastpickle.com>:
> I would:
>
> * run repair on 10.58.83.109
> * run cleanup on 10.59.21.241 (I assume this was the first node).
>
> It looks like 0.56.62.211 is out of the cluster.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote:
>
> Not sure if this may help :
>
> nodetool -h localhost gossipinfo
> /10.58.83.109
>  RELEASE_VERSION:1.1.2
>  RACK:1b
>  LOAD:5.9384978406E10
>  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
>  DC:eu-west
>  STATUS:NORMAL,85070591730234615865843651857942052864
>  RPC_ADDRESS:0.0.0.0
> /10.248.10.94
>  RELEASE_VERSION:1.1.2
>  LOAD:3.0128207422E10
>  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
>  STATUS:LEFT,0,1342866804032
>  RPC_ADDRESS:0.0.0.0
> /10.56.62.211
>  RELEASE_VERSION:1.1.2
>  LOAD:11594.0
>  RACK:1b
>  SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
>  DC:eu-west
>  REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864
>  STATUS:removed,170141183460469231731687303715884105727,1342453967415
>  RPC_ADDRESS:0.0.0.0
> /10.59.21.241
>  RELEASE_VERSION:1.1.2
>  RACK:1b
>  LOAD:1.08667047094E11
>  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
>  DC:eu-west
>  STATUS:NORMAL,0
>  RPC_ADDRESS:0.0.0.0
>
> Story :
>
> I had 2 node cluster
>
> 10.248.10.94 Token 0
> 10.59.21.241 Token 85070591730234615865843651857942052864
>
> Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1
> (170141183460469231731687303715884105727). This failed, I removed
> token.
>
> I repeat the previous operation with the node 10.59.21.241 and it went
> fine. Next I decommissionned the node 10.248.10.94 and moved
> 10.59.21.241 to the token 0.
>
> Now I am on the situation described before.
>
> Alain
>
>
> 2012/7/19 Alain RODRIGUEZ <arodrime@gmail.com>:
>
> Hi, I wasn't able to see the token used currently by the 10.56.62.211
>
> (ghost node).
>
>
> I already removed the token 6 days ago :
>
>
> -> "Removing token 170141183460469231731687303715884105727 for
> /10.56.62.211"
>
>
> "- check in cassandra log. It is possible you see a log line telling
>
> you 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same
>
> token"
>
>
> Nothing like that in the logs
>
>
> I tried the following without success :
>
>
> $ nodetool -h localhost removetoken 170141183460469231731687303715884105727
>
> Exception in thread "main" java.lang.UnsupportedOperationException:
>
> Token not found.
>
> ...
>
>
> I really thought this was going to work :-).
>
>
> Any other ideas ?
>
>
> Alain
>
>
> PS : I heard that Octo is a nice company and you use Cassandra so I
>
> guess you're fine in there :-). I wish you the best thanks for your
>
> help.
>
>
> 2012/7/19 Olivier Mallassi <omallassi@octo.com>:
>
> I got that a couple of time (due to DNS issues in our infra)
>
>
> what you could try
>
> - check in cassandra log. It is possible you see a log line telling you
>
> 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same token
>
> - if 10.56.62.211 is up, try decommission (via nodetool)
>
> - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1
>
> - use removetoken (via nodetool) to remove the token associated with
>
> 10.56.62.211. in case of failure, you can use removetoken -f instead.
>
>
> then, the unreachable IP should have disappeared.
>
>
>
> HTH
>
>
> On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ <arodrime@gmail.com>
>
> wrote:
>
>
> Hi,
>
>
> I tried to add a node a few days ago and it failed. I finally made it
>
> work with an other node but now when I describe cluster on cli I got
>
> this :
>
>
> Cluster Information:
>
>   Snitch: org.apache.cassandra.locator.Ec2Snitch
>
>   Partitioner: org.apache.cassandra.dht.RandomPartitioner
>
>   Schema versions:
>
>      UNREACHABLE: [10.56.62.211]
>
>      e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109]
>
>
> And nodetool ring gives me :
>
>
> Address         DC          Rack        Status State   Load
>
> Owns                Token
>
>
>                    85070591730234615865843651857942052864
>
> 10.59.21.241    eu-west     1b          Up     Normal  101.17 GB
>
> 50.00%              0
>
> 10.58.83.109    eu-west     1b          Up     Normal  55.27 GB
>
> 50.00%              85070591730234615865843651857942052864
>
>
> The point, as you can see, is that one of my node has twice the
>
> information of the second one. I have a RF = 2 defined.
>
>
> My guess is that the token 0 node keep data for the unreachable node.
>
>
> The IP of the unreachable node doesn't belong to me anymore, I have no
>
> access to this ghost node.
>
>
> Does someone know how to completely remove this ghost node from my cluster
>
> ?
>
>
> Thank you.
>
>
> Alain
>
>
> INFO :
>
>
> On ubuntu (AMI Datastax 2.1 and 2.2)
>
> Cassandra 1.1.2 (upgraded from 1.0.9)
>
> 2 node cluster (+ the ghost one)
>
> RF = 2
>
>
>
>
>
> --
>
> ............................................................
>
> Olivier Mallassi
>
> OCTO Technology
>
> ............................................................
>
> 50, Avenue des Champs-Elysées
>
> 75008 Paris
>
>
> Mobile: (33) 6 28 70 26 61
>
> Tél: (33) 1 58 56 10 00
>
> Fax: (33) 1 58 56 10 01
>
>
> http://www.octo.com
>
> Octo Talks! http://blog.octo.com
>
>
>
>

Mime
View raw message