Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4854510C94 for ; Thu, 13 Feb 2014 22:44:11 +0000 (UTC) Received: (qmail 4625 invoked by uid 500); 13 Feb 2014 22:44:07 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 4560 invoked by uid 500); 13 Feb 2014 22:44:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 4552 invoked by uid 99); 13 Feb 2014 22:44:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Feb 2014 22:44:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ailinykh@gmail.com designates 209.85.217.180 as permitted sender) Received: from [209.85.217.180] (HELO mail-lb0-f180.google.com) (209.85.217.180) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Feb 2014 22:44:03 +0000 Received: by mail-lb0-f180.google.com with SMTP id n15so8846508lbi.11 for ; Thu, 13 Feb 2014 14:43:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=J6X/mOtzGAsqBVCIL3kHkwnD5ufUtK9/VqES9ldBtMc=; b=x4GIL+3a0FUhv6/T6PoASXKqsd/Kl3DFU8xwv/Vl/Xa/nJk3S6jn/ZoKl7p4/U+JJW 5bYEj05/dp/+Rz+f3YXZf9imEleot/9+IQJTnq2906RYDLT5A5A4HZsYtePv0G6rI4VV kB6zLoZAXrNG6iGvIHf0YN0AKdLfkqrQt8a3iuHZ/d5M95B3tZkVySD4hOzJSqKzFElg YxUGKlhZQTI4sLySmqFATXAbMeCyE8COC4MizlEmMY9pphAWiUPUC04RmPm0t1xA5ErQ lCxy1zawnASeWqC9vj68RexFnrAT9zrZFbZfcNTsKAvFHWW7M4AzXoUGzkUoTWL1vwr1 D7Pw== MIME-Version: 1.0 X-Received: by 10.112.125.225 with SMTP id mt1mr2420088lbb.35.1392331421877; Thu, 13 Feb 2014 14:43:41 -0800 (PST) Received: by 10.114.82.130 with HTTP; Thu, 13 Feb 2014 14:43:41 -0800 (PST) In-Reply-To: References: Date: Thu, 13 Feb 2014 14:43:41 -0800 Message-ID: Subject: Re: Cass 1.2.11: Replacing a node procedure From: Andrey Ilinykh To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=089e0112bfb8cf8bff04f2516ba2 X-Virus-Checked: Checked by ClamAV on apache.org --089e0112bfb8cf8bff04f2516ba2 Content-Type: text/plain; charset=ISO-8859-1 decommission http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node On Thu, Feb 13, 2014 at 2:28 PM, Oleg Dulin wrote: > Here is what I am thinking. > > 1) Add the new node with token-1 of the old one and let it bootstrap. > 2) Once it bootstrapped, remove the old node from the ring > > Now, it is #2 that I need clarification on. > > Do I use "decommission" or "remove" ? How long should I expect those > processes to run ? > > Regards, > Oleg > > > > > On 2014-02-13 22:01:10 +0000, Oleg Dulin said: > > Dear Distinguished Colleagues: >> >> I have a situation where in the production environment one of the >> machines is overheating and needs to be serviced. Now, the landscape looks >> like this: >> >> 4 machines in primary DC, 4 machiens in DR DC. Replication factor is 2. >> >> I also have a QA environment with 4 machines in a single DC, RF=2 as well. >> >> We need to work with the manufaturer to figure out what is wrong with the >> machine. The proposed course of action is the following: >> >> 1) Take the faulty prod machine (lets call it X) out of production. >> 2) Take a healthy QA machine (lets call it Y) out of QA >> 3) Plug QA machine into the prod cluster and rebuild it. >> 4) Plug prod machine into the QA cluster and leave it alone and let the >> manufacturer service it to their liking until they say it is fixed, at >> which point we will just leave it in QA. >> >> So basically we are talking about replacing a dead node. >> >> I found this: http://www.datastax.com/documentation/cassandra/1.2/ >> webhelp/index.html#cassandra/operations/ops_replace_node_t.html >> >> I am not using vnodes, just plain vanilla tokens and RandomPartitioner. >> So that procedure doesn't apply. I need some help putting together a >> step-by-step checklist what I would need to do. >> > > > -- > Regards, > Oleg Dulin > http://www.olegdulin.com > > > --089e0112bfb8cf8bff04f2516ba2 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable


On Thu, Feb 13, 2014 at 2:28 PM, Oleg Du= lin <oleg.dulin@gmail.com> wrote:
Here is what I am thinking.

1) Add the new node with token-1 of the old one and let it bootstrap.
2) Once it bootstrapped, remove the old node from the ring

Now, it is #2 that I need clarification on.

Do I use "decommission" or "remove" ? How long should I= expect those processes to run ?

Regards,
Oleg




On 2014-02-13 22:01:10 +0000, Oleg Dulin said:

Dear Distinguished Colleagues:

I have a situation where in the production environment one of the machines = is overheating and needs to be serviced. Now, the landscape looks like this= :

4 machines in primary DC, 4 machiens in DR DC. Replication factor is 2.

I also have a QA environment with 4 machines in a single DC, RF=3D2 as well= .

We need to work with the manufaturer to figure out what is wrong with the m= achine. The proposed course of action is the following:

1) Take the faulty prod machine (lets call it X) out of production.
2) Take a healthy QA machine (lets call it Y) out of QA
3) Plug QA machine into the prod cluster and rebuild it.
4) Plug prod machine into the QA cluster and leave it alone and let the man= ufacturer service it to their liking until they say it is fixed, at which p= oint we will just leave it in QA.

So basically we are talking about replacing a dead node.

I found this: http://www.datastax.com/documentation/cassandra/1.2/<= /u>webhelp/index.html#cassandra/operations/ops_replace_node_t.html

I am not using vnodes, just plain vanilla tokens and RandomPartitioner. So = that procedure doesn't apply. I need some help putting together a step-= by-step checklist what I would need to do.


--
Regards,
Oleg Dulin
http://www.olegdulin= .com



--089e0112bfb8cf8bff04f2516ba2--