Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8EC70111D8 for ; Sun, 8 Jun 2014 01:18:55 +0000 (UTC) Received: (qmail 94289 invoked by uid 500); 8 Jun 2014 01:18:52 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 94245 invoked by uid 500); 8 Jun 2014 01:18:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 94236 invoked by uid 99); 8 Jun 2014 01:18:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Jun 2014 01:18:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mongomaven@gmail.com designates 209.85.212.195 as permitted sender) Received: from [209.85.212.195] (HELO mail-wi0-f195.google.com) (209.85.212.195) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Jun 2014 01:18:50 +0000 Received: by mail-wi0-f195.google.com with SMTP id hi2so241738wib.6 for ; Sat, 07 Jun 2014 18:18:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=wsxvFqZUr9pD5atBvH0M2BGVTRIxB3Sh6lRo+6rhkN4=; b=QlqsXgiKTaD1z8yyoPtueNOG9C6eyfZuzzqf6u4E9qRlUM3i+UQgYsqGsbTxchNMXG NhNj3XmFWvqhV3dLCQKZiFEm5g6GTGGkCUeovnehXOCzyRfrlB8JnpmeYXEbtt+aWwJD EOwl6bmxk5Xjvb7WRKuzTWRyCYUnwUExgYDppM4hjudBJg/vfZZi7s4ou6jV1vsYZkCe KoZ/ONxYiVvsM37jQ/xgm7DC1QObq0sncMh41qwPx8h8WJNlUGev8ZF6umKBqsvluG25 oHrFSpViEpmTjYfm7KZbkt1nZpvaxYcU1ve4TcXHOJLzCQC/tLQLP7PcP1aG//IRIpuQ hj8g== MIME-Version: 1.0 X-Received: by 10.180.38.38 with SMTP id d6mr17265979wik.12.1402190306311; Sat, 07 Jun 2014 18:18:26 -0700 (PDT) Received: by 10.194.62.170 with HTTP; Sat, 7 Jun 2014 18:18:26 -0700 (PDT) In-Reply-To: References: Date: Sat, 7 Jun 2014 21:18:26 -0400 Message-ID: Subject: Re: problem removing dead node from ring From: Curious Patient To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=e89a8f6469451d836204fb48dfe8 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f6469451d836204fb48dfe8 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hey all, OK I gave removing the downed node from the cassandra ring another try. To recap what's going on, this is what my ring looks like with nodetool status: [root@beta-new:~] #nodetool status Datacenter: datacenter1 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Status=3DUp/Down |/ State=3DNormal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.10.1.94 178.38 KB 256 49.4% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 DN 10.10.1.98 ? 256 50.6% f2a48fc7-a362-43f5-9061-4bb3739fdeaf rack1 So I followed the steps in this document one more time: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/op= s_replace_node_t.html And setup the following in the cassandra.yaml according to the above instructions: cluster_name: =E2=80=98Test Cluster' num_tokens: 256 seed_provider: listen_address: 10.10.1.153 auto_bootstrap: yes broadcast_address: 10.10.1.153 endpoint_snitch: SimpleSnitch initial_token: -9173731940639284976 The initial_token is the one belonging to the dead node that I'm trying to get rid of. I then make sure that the /var/lib/casssandra directory is completely empty and run this startup command: [root@cassandra1 cassandrahome]# ./bin/cassandra -Dcassandra.replace_address=3D10.10.1.98 -f Using the IP of the node I want to remove as the value to casandra_replace_address And when I do this is the error I get: java.lang.RuntimeException: Cannot replace_address /10.10.1.98 because it doesn't exist in gossip So how can I get cassandra to realize that this node needs to be replaced and that it SHOULDN'T exist in gossip because the node is down? That would seem obvious to me, so why isn't it obvious to her? :) Thanks Tim On Wed, Jun 4, 2014 at 4:36 PM, Robert Coli wrote: > On Tue, Jun 3, 2014 at 9:03 PM, Matthew Allen > wrote: > >> Thanks Robert, this makes perfect sense. Do you know if CASSANDRA-6961 >> will be ported to 1.2.x ? >> > > I just asked driftx, he said "not gonna happen." > > >> And apologies if these appear to be dumb questions, but is a repair more >> suitable than a rebuild because the rebuild only contacts 1 replica (per >> range), which may itself contain stale data ? >> > > Exactly that. > > https://issues.apache.org/jira/browse/CASSANDRA-2434 > > Discusses related issues in quite some detail. The tl;dr is that until > 2434 is resolved, streams do not necessarily come from the node departing > the range, and therefore the "unique replica count" is decreased by > changing cluster topology. > > =3DRob > --e89a8f6469451d836204fb48dfe8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hey all,

=C2=A0OK I gave removing the d= owned node from the cassandra ring another try.=C2=A0

<= div>To recap what's going on, this is what my ring looks like with node= tool status:

[root@beta-new:~] #nodetool status

Datacenter: datacenter1

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D

Status=3DUp/Down

|/ State=3DNormal/Leaving/Joining/Moving

--=C2=A0 Address =C2=A0 =C2=A0 =C2=A0 =C2=A0 Load =C2=A0 =C2= =A0 =C2=A0 Tokens=C2=A0 Owns =C2=A0 Host ID =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 R= ack

UN=C2=A0 10.10.1.94=C2=A0 178.38 KB=C2=A0 256 =C2=A0 =C2=A0 4= 9.4%=C2=A0 fd2f76ae-8dcf-4e93-a37f-bf1e9088696e=C2=A0 rack1

DN=C2=A0 10.10.1.98 =C2=A0 =C2=A0 ?=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 256 =C2=A0 =C2=A0 50.6%=C2=A0 f2a48fc7-a362-43f5-9061-4bb3739fde= af=C2=A0 rack1

So I followed the steps in this document on= e more time:

http://= www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repla= ce_node_t.html

And setup the following in the cassandra.yaml according to th= e above instructions:

cluster_name: =E2=80=98Test Cluster&= #39;

num_tokens: 256

seed_provider:

listen_address: 10.10.1.153

auto_bootstrap: yes

broadcast_address: 10.10= .1.153

endpoint_snitch: SimpleSnitch

initial_to= ken: -9173731940639284976


The in= itial_token is the one belonging to the dead node that I'm trying to ge= t rid of.

I then make sure that the /var/lib/casssandra directory is co= mpletely empty and run this startup command:

[root@cassandra1 cassandrahome]# ./bin/cassandra -Dcassan= dra.replace_address=3D10.10.1.98 -f

Using the IP of the no= de I want to remove as the value to casandra_replace_address

And when I do this is the error I get:

java.lang.RuntimeEx= ception: Cannot replace_address /10.10.1.98 because it doesn't exist in gossip


So how can I get cassandra to realize that this node needs to be replaced a= nd that it SHOULDN'T exist in gossip because the node is down? That wou= ld seem obvious to me, so why isn't it obvious to her? :)


Thanks

Tim


=






=

On Wed, Jun 4, 2014 at 4:36 PM, Robert Coli <rcoli@eventbrite.com= > wrote:
On Tue, Jun 3, 2014 at 9:03 PM, Matthew Allen <matthew= .j.allen@gmail.com> wrote:
Thanks Robert, this makes perfect se= nse.=C2=A0 Do you know if CASSANDRA-6961 will be ported to 1.2.x ?

I just asked driftx, he said &= quot;not gonna happen."
=C2=A0
And apologies if these appear to be dumb questions, b= ut is a repair more suitable than a rebuild because the rebuild only contac= ts 1 replica (per range), which may itself contain stale data ?

Exactly that.

https://issues.apache.org/jira/browse/CASSANDRA-2434

Discusses related issues in quite some detail. The tl;dr is that until 2434= is resolved, streams do not necessarily come from the node departing the r= ange, and therefore the "unique replica count" is decreased by ch= anging cluster topology.

=3DRob

--e89a8f6469451d836204fb48dfe8--