Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5DAFE9BFF for ; Fri, 20 Jul 2012 09:07:31 +0000 (UTC) Received: (qmail 21467 invoked by uid 500); 20 Jul 2012 09:07:28 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 21219 invoked by uid 500); 20 Jul 2012 09:07:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 21200 invoked by uid 99); 20 Jul 2012 09:07:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jul 2012 09:07:28 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,TRACKER_ID X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a80.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jul 2012 09:07:20 +0000 Received: from homiemail-a80.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTP id 1352137A073 for ; Fri, 20 Jul 2012 02:06:59 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=ouK9CwnsdV LhaGQpc6/FuUqZfMtjzlmmxMcD4W/B+3trcOTxPZ39uUfwPGHxp7b+OP0H8t3Ddw SDf81OO5NjJrpqES8zBuPy6BTmErnu9jZLrllZq2xWmiwIDRYmQmyZXCgtOPGmFo VCVx1z6Vid6qcBBdVq3Wo6+AtZ1zo3JOA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=owrh/JKIClHU9sRC lituXPSmTkg=; b=IKYL2ih3P0ytjVLjjtCVc9UovhcexNx6of+Apxd/f5P5lCLx vVccyQMDOYsDjUuchRnh4prrIcz+26TF1UvVAd0VszPZPtaqm9GlBKvt6WnaUA9o grsJrcO7KGo0Mz9XAIto+8Zjl7AXcMGGUWW3chbVUeUgLVBgxLhy8QmTW2s= Received: from [172.16.1.10] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTPSA id 3B48B37A065 for ; Fri, 20 Jul 2012 02:06:58 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: multipart/alternative; boundary="Apple-Mail=_70BFEB37-78AB-4E75-8CC4-714024E1D4BF" Subject: Re: Unreachable node, not in nodetool ring Date: Fri, 20 Jul 2012 21:06:56 +1200 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: <767E9D1D-E3F5-4435-956C-75C94A60D04A@thelastpickle.com> X-Mailer: Apple Mail (2.1278) --Apple-Mail=_70BFEB37-78AB-4E75-8CC4-714024E1D4BF Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 I would: * run repair on 10.58.83.109 * run cleanup on 10.59.21.241 (I assume this was the first node).=20 It looks like 0.56.62.211 is out of the cluster.=20 Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote: > Not sure if this may help : >=20 > nodetool -h localhost gossipinfo > /10.58.83.109 > RELEASE_VERSION:1.1.2 > RACK:1b > LOAD:5.9384978406E10 > SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 > DC:eu-west > STATUS:NORMAL,85070591730234615865843651857942052864 > RPC_ADDRESS:0.0.0.0 > /10.248.10.94 > RELEASE_VERSION:1.1.2 > LOAD:3.0128207422E10 > SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 > STATUS:LEFT,0,1342866804032 > RPC_ADDRESS:0.0.0.0 > /10.56.62.211 > RELEASE_VERSION:1.1.2 > LOAD:11594.0 > RACK:1b > SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f > DC:eu-west > REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864 > STATUS:removed,170141183460469231731687303715884105727,1342453967415 > RPC_ADDRESS:0.0.0.0 > /10.59.21.241 > RELEASE_VERSION:1.1.2 > RACK:1b > LOAD:1.08667047094E11 > SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 > DC:eu-west > STATUS:NORMAL,0 > RPC_ADDRESS:0.0.0.0 >=20 > Story : >=20 > I had 2 node cluster >=20 > 10.248.10.94 Token 0 > 10.59.21.241 Token 85070591730234615865843651857942052864 >=20 > Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1 > (170141183460469231731687303715884105727). This failed, I removed > token. >=20 > I repeat the previous operation with the node 10.59.21.241 and it went > fine. Next I decommissionned the node 10.248.10.94 and moved > 10.59.21.241 to the token 0. >=20 > Now I am on the situation described before. >=20 > Alain >=20 >=20 > 2012/7/19 Alain RODRIGUEZ : >> Hi, I wasn't able to see the token used currently by the 10.56.62.211 >> (ghost node). >>=20 >> I already removed the token 6 days ago : >>=20 >> -> "Removing token 170141183460469231731687303715884105727 for = /10.56.62.211" >>=20 >> "- check in cassandra log. It is possible you see a log line telling >> you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same >> token" >>=20 >> Nothing like that in the logs >>=20 >> I tried the following without success : >>=20 >> $ nodetool -h localhost removetoken = 170141183460469231731687303715884105727 >> Exception in thread "main" java.lang.UnsupportedOperationException: >> Token not found. >> ... >>=20 >> I really thought this was going to work :-). >>=20 >> Any other ideas ? >>=20 >> Alain >>=20 >> PS : I heard that Octo is a nice company and you use Cassandra so I >> guess you're fine in there :-). I wish you the best thanks for your >> help. >>=20 >> 2012/7/19 Olivier Mallassi : >>> I got that a couple of time (due to DNS issues in our infra) >>>=20 >>> what you could try >>> - check in cassandra log. It is possible you see a log line telling = you >>> 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token >>> - if 10.56.62.211 is up, try decommission (via nodetool) >>> - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1 >>> - use removetoken (via nodetool) to remove the token associated with >>> 10.56.62.211. in case of failure, you can use removetoken -f = instead. >>>=20 >>> then, the unreachable IP should have disappeared. >>>=20 >>>=20 >>> HTH >>>=20 >>> On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ = >>> wrote: >>>>=20 >>>> Hi, >>>>=20 >>>> I tried to add a node a few days ago and it failed. I finally made = it >>>> work with an other node but now when I describe cluster on cli I = got >>>> this : >>>>=20 >>>> Cluster Information: >>>> Snitch: org.apache.cassandra.locator.Ec2Snitch >>>> Partitioner: org.apache.cassandra.dht.RandomPartitioner >>>> Schema versions: >>>> UNREACHABLE: [10.56.62.211] >>>> e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, = 10.58.83.109] >>>>=20 >>>> And nodetool ring gives me : >>>>=20 >>>> Address DC Rack Status State Load >>>> Owns Token >>>>=20 >>>> 85070591730234615865843651857942052864 >>>> 10.59.21.241 eu-west 1b Up Normal 101.17 GB >>>> 50.00% 0 >>>> 10.58.83.109 eu-west 1b Up Normal 55.27 GB >>>> 50.00% 85070591730234615865843651857942052864 >>>>=20 >>>> The point, as you can see, is that one of my node has twice the >>>> information of the second one. I have a RF =3D 2 defined. >>>>=20 >>>> My guess is that the token 0 node keep data for the unreachable = node. >>>>=20 >>>> The IP of the unreachable node doesn't belong to me anymore, I have = no >>>> access to this ghost node. >>>>=20 >>>> Does someone know how to completely remove this ghost node from my = cluster >>>> ? >>>>=20 >>>> Thank you. >>>>=20 >>>> Alain >>>>=20 >>>> INFO : >>>>=20 >>>> On ubuntu (AMI Datastax 2.1 and 2.2) >>>> Cassandra 1.1.2 (upgraded from 1.0.9) >>>> 2 node cluster (+ the ghost one) >>>> RF =3D 2 >>>=20 >>>=20 >>>=20 >>>=20 >>> -- >>> ............................................................ >>> Olivier Mallassi >>> OCTO Technology >>> ............................................................ >>> 50, Avenue des Champs-Elys=E9es >>> 75008 Paris >>>=20 >>> Mobile: (33) 6 28 70 26 61 >>> T=E9l: (33) 1 58 56 10 00 >>> Fax: (33) 1 58 56 10 01 >>>=20 >>> http://www.octo.com >>> Octo Talks! http://blog.octo.com >>>=20 >>>=20 --Apple-Mail=_70BFEB37-78AB-4E75-8CC4-714024E1D4BF Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 I = would:

* run repair on 10.58.83.109
* = run cleanup on 10.59.21.241 (I assume this was the first = node). 

It looks like 0.56.62.211 is = out of the = cluster. 

Cheers

http://www.thelastpickle.com

On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote:

Not = sure if this may help :

nodetool -h localhost = gossipinfo
/10.58.83.109
 RELEASE_VERSION:1.1.2
=  RACK:1b
 LOAD:5.9384978406E10
=  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
=  DC:eu-west
=  STATUS:NORMAL,85070591730234615865843651857942052864
=  RPC_ADDRESS:0.0.0.0
/10.248.10.94
=  RELEASE_VERSION:1.1.2
 LOAD:3.0128207422E10
=  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
=  STATUS:LEFT,0,1342866804032
=  RPC_ADDRESS:0.0.0.0
/10.56.62.211
=  RELEASE_VERSION:1.1.2
 LOAD:11594.0
 RACK:1b
=  SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
=  DC:eu-west
=  REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864 =  STATUS:removed,170141183460469231731687303715884105727,1342453967415=
 RPC_ADDRESS:0.0.0.0
/10.59.21.241
=  RELEASE_VERSION:1.1.2
 RACK:1b
=  LOAD:1.08667047094E11
=  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
=  DC:eu-west
 STATUS:NORMAL,0
=  RPC_ADDRESS:0.0.0.0

Story :

I had 2 node = cluster

10.248.10.94 Token 0
10.59.21.241 Token = 85070591730234615865843651857942052864

Had to replace node = 10.248.10.94 so I add 10.56.62.211 on token 0 - = 1
(170141183460469231731687303715884105727). This failed, I = removed
token.

I repeat the previous operation with the node = 10.59.21.241 and it went
fine. Next I decommissionned the node = 10.248.10.94 and moved
10.59.21.241 to the token 0.

Now I am = on the situation described before.

Alain


2012/7/19 = Alain RODRIGUEZ <arodrime@gmail.com>:
Hi, I wasn't able to see the token used currently by = the 10.56.62.211
(ghost = node).

I already = removed the token 6 days ago :

-> "Removing = token 170141183460469231731687303715884105727 for = /10.56.62.211"

"- check in = cassandra log. It is possible you see a log line = telling
you 10.56.62.211 and = 10.59.21.241 o 10.58.83.109  share the = same
token"

Nothing like = that in the logs

I tried the = following without success :

$ nodetool -h = localhost removetoken = 170141183460469231731687303715884105727
Exception in thread "main" = java.lang.UnsupportedOperationException:
Token not found.
...

I really = thought this was going to work :-).

Any other ideas = ?

Alain

PS : I heard = that Octo is a nice company and you use Cassandra so = I
guess you're fine in there = :-). I wish you the best thanks for your
help.

2012/7/19 = Olivier Mallassi <omallassi@octo.com>:
I got that a = couple of time (due to DNS issues in our = infra)

what you could = try
- check in cassandra log. It is possible you see a log = line telling you
10.56.62.211 and 10.59.21.241 o = 10.58.83.109  share the same = token
- if 10.56.62.211 is up, try decommission (via = nodetool)
- if not, move 10.59.21.241 or = 10.58.83.109 to current token + = 1
- use removetoken (via nodetool) to remove the token = associated with
10.56.62.211. in case of = failure, you can use removetoken -f = instead.

then, the unreachable IP should = have disappeared.


HTH

On Thu, Jul 19, 2012 at 10:38 = AM, Alain RODRIGUEZ <arodrime@gmail.com>
wrote:

Hi,

I = tried to add a node a few days ago and it failed. I finally made = it
work = with an other node but now when I describe cluster on cli I = got
this = :

Cluster = Information:
=   Snitch: = org.apache.cassandra.locator.Ec2Snitch
  Partitioner: = org.apache.cassandra.dht.RandomPartitioner
  Schema = versions:
=      UNREACHABLE: = [10.56.62.211]
=      e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: = [10.59.21.241, = 10.58.83.109]

And = nodetool ring gives me = :

Address =         DC =          Rack =        Status State =   Load
Owns =             &n= bsp;  Token

=             &n= bsp;      8507059173023461586584365185794205= 2864
10.59.21.241    eu-west =     1b =          Up =     Normal  101.17 = GB
50.00% =             &n= bsp;0
10.58.83.109    eu-west =     1b =          Up =     Normal  55.27 = GB
50.00% =             &n= bsp;85070591730234615865843651857942052864

The = point, as you can see, is that one of my node has twice = the
information of the second one. I have a RF =3D 2 = defined.

My = guess is that the token 0 node keep data for the unreachable = node.

The IP = of the unreachable node doesn't belong to me anymore, I have = no
access = to this ghost = node.

Does = someone know how to completely remove this ghost node from my = cluster
?

Thank = you.

Alain

INFO = :

On = ubuntu (AMI Datastax 2.1 and = 2.2)
Cassandra 1.1.2 (upgraded from = 1.0.9)
2 node = cluster (+ the ghost = one)
RF =3D = 2




--
............................................................=
Olivier Mallassi
OCTO = Technology
............................................................=
50, Avenue des = Champs-Elys=E9es
75008 = Paris

Mobile: (33) 6 28 70 26 = 61
T=E9l: (33) 1 58 56 10 = 00
Fax: (33) 1 58 56 10 = 01

http://www.octo.com
Octo Talks! = http://blog.octo.com



<= /div>= --Apple-Mail=_70BFEB37-78AB-4E75-8CC4-714024E1D4BF--