Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7EC39FD35 for ; Thu, 21 Mar 2013 16:59:02 +0000 (UTC) Received: (qmail 47829 invoked by uid 500); 21 Mar 2013 16:58:59 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 47800 invoked by uid 500); 21 Mar 2013 16:58:59 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 47792 invoked by uid 99); 21 Mar 2013 16:58:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 16:58:59 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a91.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 16:58:53 +0000 Received: from homiemail-a91.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a91.g.dreamhost.com (Postfix) with ESMTP id E1514AE075 for ; Thu, 21 Mar 2013 09:58:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=J/j+EheQb9SQNayJLwX5q+lbJA 8=; b=aw4T944xgkuVIZGvGNJPEosnwNeJCzjN2359kEu0VHxg2pCmy4+lfcYcVB Ie55pEKfK9/p41X1UBmLCszO/80qcoJ5IBlMyYUUH8zgujl2/Y3ft5+VxCnhQkGZ ISCVN8xfbTHNZJ7nN1sy4EHi7dfLSA6hMZWf2BsB4x/O/7fNo= Received: from [172.16.1.8] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a91.g.dreamhost.com (Postfix) with ESMTPSA id 08914AE070 for ; Thu, 21 Mar 2013 09:58:18 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_7E79054E-C164-4C5A-ADE2-2EDC2BA9FB87" Message-Id: <1C4330B9-EE2D-46FD-AD53-0654020B43CB@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Recovering from a faulty cassandra node Date: Fri, 22 Mar 2013 05:58:29 +1300 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_7E79054E-C164-4C5A-ADE2-2EDC2BA9FB87 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 > Not sure if I needed to change cassandra-topology.properties file on = the existing nodes. If you are using the PropertyFileSnitch all nodes need to have the same = cassandra-topology.properties file.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/03/2013, at 1:34 AM, Jabbar Azam wrote: > I've added the node with a different IP address and after disabling = the firewall data is being streamed from the existing nodes to the wiped = node. I'll do a cleanup, followed by remove node once it's done. >=20 > I've also added the new node to the existing nodes' = cassandra-topology.properties file and restarted them. I also found I = had iptables switched on and couldn't understand why the wiped node = couldn't see the cluster. Not sure if I needed to change = cassandra-topology.properties file on the existing nodes. >=20 >=20 >=20 >=20 > On 19 March 2013 15:49, Jabbar Azam wrote: > Do I use removenode before adding the reinstalled node or after? >=20 >=20 > On 19 March 2013 15:45, Alain RODRIGUEZ wrote: > In 1.2, you may want to use the nodetool removenode if your server i = broken or unreachable, else I guess nodetool decommission remains the = good way to remove a node. = (http://www.datastax.com/docs/1.2/references/nodetool) >=20 > When this node is out, rm -rf /yourpath/cassandra/* on this serveur, = change the configuration if needed (not sure about the auto_bootstrap = param) and start Cassandra on that node again. It should join the ring = as a new node. >=20 > Good luck. >=20 >=20 > 2013/3/19 Hiller, Dean >=20 > Since you "cleared" out that node, it IS the replacement node. >=20 > Dean >=20 > From: Jabbar Azam > > Reply-To: = "user@cassandra.apache.org" = > > Date: Tuesday, March 19, 2013 9:29 AM > To: "user@cassandra.apache.org" = > > Subject: Re: Recovering from a faulty cassandra node >=20 > Hello Dean. >=20 > I'm using vnodes so can't specify a token. In addition I can't follow = the replace node docs because I don't have a replacement node. >=20 >=20 > On 19 March 2013 15:25, Hiller, Dean = > wrote: > I have not done this as of yet but from all that I have read your best = option is to follow the replace node documentation which I belive you = need to >=20 >=20 > 1. Have the token be the same BUT add 1 to it so it doesn't think = it's the same computer > 2. Have the bootstrap option set or something so streaming takes = affect. >=20 > I would however test that all out in QA to make sure it works and if = you have QUOROM reads/writes a good part of that test would be to take = node X down after your node Y is back in the cluster to make sure = reads/writes are working on the node you fixed=85..you just need to make = sure node X shares one of the token ranges of node Y AND your = writes/reads are in that token range. >=20 > Dean >=20 > From: Jabbar Azam = >> > Reply-To: = "user@cassandra.apache.org>" = >> > Date: Tuesday, March 19, 2013 8:51 AM > To: = "user@cassandra.apache.org>" = >> > Subject: Recovering from a faulty cassandra node >=20 > Hello, >=20 > I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I = waited for over a week to insert lots of data into the cluster. During = the end of the process one of the nodes had a hardware fault. >=20 > I have fixed the hardware fault but the filing system on that node is = corrupt so I'll have to reinstall the OS and cassandra. >=20 > I can think of two ways of reintegrating the host into the cluster >=20 > 1) shrink the cluster to three nodes and add the node into the cluster >=20 > 2) Add the node into the cluster without shrinking >=20 > I'm not sure of the best approach to take and I'm not sure how to = achieve each step. >=20 > Can anybody help? >=20 >=20 > -- > Thanks >=20 > Jabbar Azam >=20 >=20 >=20 > -- > Thanks >=20 > Jabbar Azam >=20 >=20 >=20 >=20 > --=20 > Thanks >=20 > Jabbar Azam >=20 >=20 >=20 > --=20 > Thanks >=20 > Jabbar Azam --Apple-Mail=_7E79054E-C164-4C5A-ADE2-2EDC2BA9FB87 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252
 Not sure if I needed = to change cassandra-topology.properties file on the existing = nodes.
If you are using the PropertyFileSnitch all = nodes need to have the same cassandra-topology.properties = file. 

Cheers

http://www.thelastpickle.com

On 21/03/2013, at 1:34 AM, Jabbar Azam <ajazam@gmail.com> = wrote:

I've added the node with a different = IP address and after disabling the firewall data is being streamed from = the existing nodes to the wiped node. I'll do a cleanup, followed by = remove node once it's done.

I've also added the new node to the existing nodes' = cassandra-topology.properties file and restarted them. I also found I = had iptables switched on and couldn't understand why the wiped node = couldn't see the cluster. Not sure if I needed to change = cassandra-topology.properties file on the existing nodes.




On 19 March 2013 15:49, Jabbar Azam <ajazam@gmail.com> wrote:
Do I use removenode before adding the reinstalled node = or after?


On 19 March 2013 15:45, = Alain RODRIGUEZ <arodrime@gmail.com> wrote:
In = 1.2, you may want to use the nodetool removenode if your server i broken = or unreachable, else I guess nodetool decommission remains the good way = to remove a node. (http://www.datastax.com/docs/1.2/references/nodetool= )

When this node is out, rm -rf /yourpath/cassandra/* on = this serveur, change the configuration if needed (not sure about the = auto_bootstrap param) and start Cassandra on that node again. It should = join the ring as a new node.

Good luck.


2013/3/19 = Hiller, Dean <Dean.Hiller@nrel.gov>

Since you "cleared" = out that node, it IS the replacement node.
Date: Tuesday, March 19, 2013 9:29 AM
Subject: Re: Recovering from a faulty cassandra node

Hello Dean.

I'm using vnodes so can't specify a token. In addition I can't follow = the replace node docs because I don't have a replacement node.


On 19 March 2013 15:25, Hiller, Dean <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>> wrote:
I have not done this as of yet but from all that I have read your best = option is to follow the replace node documentation which I belive you = need to


 1.  Have the token be the same BUT add 1 to it so it doesn't = think it's the same computer
 2.  Have the bootstrap option set or something so streaming = takes affect.

I would however test that all out in QA to make sure it works and if you = have QUOROM reads/writes a good part of that test would be to take node = X down after your node Y is back in the cluster to make sure = reads/writes are working on the node you fixed=85..you just need to make = sure node X shares one of the token ranges of node Y AND your = writes/reads are in that token range.

Dean

From: Jabbar Azam <ajazam@gmail.com<mailto:ajazam@gmail.com><mailto:ajazam@gmail.com<mailto:ajazam@gmail.com>>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Date: Tuesday, March 19, 2013 8:51 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Subject: Recovering from a faulty cassandra node

Hello,

I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I = waited for over a week to insert lots of data into the cluster. During = the end of the process one of the nodes had a hardware fault.

I have fixed the hardware fault but the filing system on that node is = corrupt so I'll have to reinstall the OS and cassandra.

I can think of two ways of reintegrating the host into the cluster

1) shrink the cluster to three nodes and add the node into the = cluster

2) Add the node into the cluster without shrinking

I'm not sure of the best approach to take and I'm not sure how to = achieve each step.

Can anybody help?


--
Thanks

 Jabbar Azam



--
Thanks

Jabbar Azam




--
Thanks

Jabbar Azam



--
Thanks

Jabbar Azam

= --Apple-Mail=_7E79054E-C164-4C5A-ADE2-2EDC2BA9FB87--