Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 80ADE10625 for ; Tue, 5 Nov 2013 18:19:10 +0000 (UTC) Received: (qmail 3028 invoked by uid 500); 5 Nov 2013 18:19:09 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 2996 invoked by uid 500); 5 Nov 2013 18:19:09 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 2987 invoked by uid 99); 5 Nov 2013 18:19:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Nov 2013 18:19:09 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of metacret@gmail.com designates 74.125.83.47 as permitted sender) Received: from [74.125.83.47] (HELO mail-ee0-f47.google.com) (74.125.83.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Nov 2013 18:18:59 +0000 Received: by mail-ee0-f47.google.com with SMTP id c13so2006430eek.34 for ; Tue, 05 Nov 2013 10:18:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=AWHCbMT9ieOAyi/weY09bL3jcfxc4uhissMwnkR9mBw=; b=VfP1hH3pj+/v1uWM0Zhk46hBtj83XExCrANE0mbdLsQFa9W2FEBLAlmfIzgWHl8FDk Tmy9DLZmwibF56psKYn9Xc1EpQxcw43GjGGMYHVAu9m9tD9JlTHsEcy1Is/SljgE9rjY rST5HAc9QfqmJXYjLlzim3COFbhzBiP5c/YDaKZaLZuGLufZYIDAPHMierZbir6rmgLd KB0SCYJxlhekm62RQiYJsAd3DerYEIYxp/59dzSjkMo1yivD7JDzj+NBCcm32Nmq7LdU Raqyi2IArWhOKHNaV61rH5+erR6ugqvysr3M/xo/m0sz1B05jIt0dNXsfpZUYS8uRAN6 yOPQ== MIME-Version: 1.0 X-Received: by 10.14.7.71 with SMTP id 47mr1054296eeo.122.1383675519129; Tue, 05 Nov 2013 10:18:39 -0800 (PST) Received: by 10.223.201.132 with HTTP; Tue, 5 Nov 2013 10:18:38 -0800 (PST) In-Reply-To: References: Date: Tue, 5 Nov 2013 10:18:38 -0800 Message-ID: Subject: Re: How to join quorum without restarting existing servers From: "Bae, Jae Hyeon" To: user@zookeeper.apache.org Content-Type: multipart/mixed; boundary=089e01635356cd876a04ea720f34 X-Virus-Checked: Checked by ClamAV on apache.org --089e01635356cd876a04ea720f34 Content-Type: multipart/alternative; boundary=089e01635356cd876304ea720f32 --089e01635356cd876304ea720f32 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I am attaching log file. Could you take a look why the new instance cannot join quorum? On Tue, Nov 5, 2013 at 9:52 AM, Bae, Jae Hyeon wrote: > Thanks a lot Ben > > We are also using zookeeper in AWS with elastic IP. Why I asked this > question is, when the bad Zookeeper EC2 instance is terminated and new > instance is launched with the previous elastic IP, it cannot join quorum > without any specific error messages. But when I did rolling restart, the > new instance started normally, synchronized and joined quorum. > > As I understand German's response, the new instance should start, > synchronize, and join quorum successfully without any impact on existing > instances but it didn't. I will investigate further. > > Thank you > Best, Jae > > > On Tue, Nov 5, 2013 at 8:24 AM, Ben Hall wrote: > >> Hi Jae, >> >> I wrote that article several years ago. (tbh - I hope it is not totally >> out of date by now). I agree with German's points. >> >> The issue it was solving was to replace a bad server without having to >> shutdown the ensemble and without having to update the config files on >> each server. I would also add that this only works as long as the server >> names and ports are the same - iirc at the time the article was written = we >> were using servers in AWS and referencing them either by assigned >> hostnames such as zookeeper-[01|11] or by elastic IP's that could be mov= ed >> from server to server. >> >> If I understand your question correctly, if you are "adding a new server= " >> such as going from 7 to 9 servers, then this approach won't benefit you = as >> you. >> >> We also used this approach when we would upgrade the servers, but like >> German said we did it one server at a time so that the Leader election >> could be natural. This allowed us to upgrade a pool of 11 servers who >> were responsible for many thousands of client connections without any do= wn >> time. >> >> Thanks >> Ben >> >> >> On 11/5/13 6:51 AM, "German Blanco" >> wrote: >> >> >... and make sure that there is no rubbish in the data dir of the new >> >server. >> > >> > >> >On Tue, Nov 5, 2013 at 3:49 PM, German Blanco < >> >german.blanco.blanco@gmail.com> wrote: >> > >> >> Hello Jae, >> >> >> >> I think that the answer to your question is "no, there is no benefit = in >> >>a >> >> rolling restart in that case". >> >> If you remove a machine that was hosting a zookeeper server that was >> >>part >> >> of a cluster, and replace it with a new machine, with a zookeeper >> server >> >> running the same software version and listening on the same IP and >> >>ports, >> >> then this new server will join the cluster, synchronize and start >> >>working >> >> normally. >> >> I wouldn't recommend to replace more than one server at a time, and I >> >> think that it is better if the new server joins while the existing >> >>quorum >> >> is stable (avoid leader elections while the new server joins, i.e. >> avoid >> >> restarts or disconnections of the existing servers). >> >> >> >> Best regards, >> >> >> >> Germ=E1n. >> >> >> >> >> >> On Tue, Nov 5, 2013 at 6:42 AM, Bae, Jae Hyeon >> >>wrote: >> >> >> >>> Hi >> >>> >> >>> I read an article >> >>> >> >>> >> >>> >> http://www.benhallbenhall.com/2011/07/rolling-restart-in-apache-zookeepe >> >>>r-to-dynamically-add-servers-to-the-ensemble/ >> >>> >> >>> My question is, even though failed hardware is replaced with the sam= e >> >>>IP >> >>> address, do I need to do rolling restart for adding replaced hardwar= e >> >>>to >> >>> the quorum? >> >>> >> >>> I am using zookeeper ver3.4.5. >> >>> >> >>> Thank you >> >>> Best, Jae >> >>> >> >> >> >> >> >> > --089e01635356cd876304ea720f32 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I am attaching log file. Could you take a look why the new= instance cannot join quorum?


On Tue, Nov 5, 2013 at 9:52 AM, Bae, Jae Hyeon <met= acret@gmail.com> wrote:
Thanks a lot Ben

We are also using zookeeper in AWS with elastic IP. Why I asked th= is question is, when the bad Zookeeper EC2 instance is terminated and new i= nstance is launched with the previous elastic IP, it cannot join quorum wit= hout any specific error messages. But when I did rolling restart, the new i= nstance started normally, synchronized and joined quorum.

As I understand German's response, the new instance= should start, synchronize, and join quorum successfully without any impact= on existing instances but it didn't. I will investigate further.

Thank you
Best, Jae


On Tue, Nov 5, 2013 at 8:24 AM, Ben Hall <= ;ben@zynga.com> wrote:
Hi Jae,

I wrote that article several years ago. (tbh - I hope it is not totally
out of date by now). =A0I agree with German's points.

The issue it was solving was to replace a bad server without having to
shutdown the ensemble and without having to update the config files on
each server. I would also add that this only works as long as the server names and ports are the same - iirc at the time the article was written we<= br> were using servers in AWS and referencing them either by assigned
hostnames such as zookeeper-[01|11] or by elastic IP's that could be mo= ved
from server to server.

If I understand your question correctly, if you are "adding a new serv= er"
such as going from 7 to 9 servers, then this approach won't benefit you= as
you.

We also used this approach when we would upgrade the servers, but like
German said we did it one server at a time so that the Leader election
could be natural. =A0This allowed us to upgrade a pool of 11 servers who were responsible for many thousands of client connections without any down<= br> time.

Thanks
Ben


On 11/5/13 6:51 AM, "German Blanco" <german.blanco.blanco@gmail.com> wrote:

>... and make sure that there is no rubbish in the data dir of the new >server.
>
>
>On Tue, Nov 5, 2013 at 3:49 PM, German Blanco <
>
ger= man.blanco.blanco@gmail.com> wrote:
>
>> Hello Jae,
>>
>> I think that the answer to your question is "no, there is no = benefit in
>>a
>> rolling restart in that case".
>> If you remove a machine that was hosting a zookeeper server that w= as
>>part
>> of a cluster, and replace it with a new machine, with a zookeeper = server
>> running the same software version and listening on the same IP and=
>>ports,
>> then this new server will join the cluster, synchronize and start<= br> >>working
>> normally.
>> I wouldn't recommend to replace more than one server at a time= , and I
>> think that it is better if the new server joins while the existing=
>>quorum
>> is stable (avoid leader elections while the new server joins, i.e.= avoid
>> restarts or disconnections of the existing servers).
>>
>> Best regards,
>>
>> Germ=E1n.
>>
>>
>> On Tue, Nov 5, 2013 at 6:42 AM, Bae, Jae Hyeon <metacret@gmail.com>
>>wrote:
>>
>>> Hi
>>>
>>> I read an article
>>>
>>>
>>>http://www.benhallbenhall.com/2011/= 07/rolling-restart-in-apache-zookeepe
>>>r-to-dynamically-add-servers-to-the-ensemble/
>>>
>>> My question is, even though failed hardware is replaced with t= he same
>>>IP
>>> address, do I need to do rolling restart for adding replaced h= ardware
>>>to
>>> the quorum?
>>>
>>> I am using zookeeper ver3.4.5.
>>>
>>> Thank you
>>> Best, Jae
>>>
>>
>>



--089e01635356cd876304ea720f32-- --089e01635356cd876a04ea720f34--