Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: pass (nike.apache.org: domain of metacret@gmail.com designates
 74.125.83.47 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAKe7ALfXpSigMs=OwzinJjXtAwG6hxj3S8aAKOW60BPsZVKxTg@mail.gmail.com>
References: 
 <CAEH-zfo8Byx6+-Dc4aMn2rB=PBhiNtzRDfiEKfCowkKAjnWh-g@mail.gmail.com>
	<CE9E594E.8CF38%ben@zynga.com>
	<CAKe7ALfXpSigMs=OwzinJjXtAwG6hxj3S8aAKOW60BPsZVKxTg@mail.gmail.com>
Date: Tue, 5 Nov 2013 10:18:38 -0800
Message-ID: 
 <CAKe7ALcMkV8xFJmaUzcc3uHo0ZdakF6Da1V_7vGddzKRY0ThyQ@mail.gmail.com>
Subject: Re: How to join quorum without restarting existing servers
From: "Bae, Jae Hyeon" <metacret@gmail.com>
To: user@zookeeper.apache.org
Content-Type: multipart/mixed; boundary=089e01635356cd876a04ea720f34

--089e01635356cd876a04ea720f34
Content-Type: multipart/alternative; boundary=089e01635356cd876304ea720f32

--089e01635356cd876304ea720f32
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I am attaching log file. Could you take a look why the new instance cannot
join quorum?


On Tue, Nov 5, 2013 at 9:52 AM, Bae, Jae Hyeon <metacret@gmail.com> wrote:

> Thanks a lot Ben
>
> We are also using zookeeper in AWS with elastic IP. Why I asked this
> question is, when the bad Zookeeper EC2 instance is terminated and new
> instance is launched with the previous elastic IP, it cannot join quorum
> without any specific error messages. But when I did rolling restart, the
> new instance started normally, synchronized and joined quorum.
>
> As I understand German's response, the new instance should start,
> synchronize, and join quorum successfully without any impact on existing
> instances but it didn't. I will investigate further.
>
> Thank you
> Best, Jae
>
>
> On Tue, Nov 5, 2013 at 8:24 AM, Ben Hall <ben@zynga.com> wrote:
>
>> Hi Jae,
>>
>> I wrote that article several years ago. (tbh - I hope it is not totally
>> out of date by now).  I agree with German's points.
>>
>> The issue it was solving was to replace a bad server without having to
>> shutdown the ensemble and without having to update the config files on
>> each server. I would also add that this only works as long as the server
>> names and ports are the same - iirc at the time the article was written =
we
>> were using servers in AWS and referencing them either by assigned
>> hostnames such as zookeeper-[01|11] or by elastic IP's that could be mov=
ed
>> from server to server.
>>
>> If I understand your question correctly, if you are "adding a new server=
"
>> such as going from 7 to 9 servers, then this approach won't benefit you =
as
>> you.
>>
>> We also used this approach when we would upgrade the servers, but like
>> German said we did it one server at a time so that the Leader election
>> could be natural.  This allowed us to upgrade a pool of 11 servers who
>> were responsible for many thousands of client connections without any do=
wn
>> time.
>>
>> Thanks
>> Ben
>>
>>
>> On 11/5/13 6:51 AM, "German Blanco" <german.blanco.blanco@gmail.com>
>> wrote:
>>
>> >... and make sure that there is no rubbish in the data dir of the new
>> >server.
>> >
>> >
>> >On Tue, Nov 5, 2013 at 3:49 PM, German Blanco <
>> >german.blanco.blanco@gmail.com> wrote:
>> >
>> >> Hello Jae,
>> >>
>> >> I think that the answer to your question is "no, there is no benefit =
in
>> >>a
>> >> rolling restart in that case".
>> >> If you remove a machine that was hosting a zookeeper server that was
>> >>part
>> >> of a cluster, and replace it with a new machine, with a zookeeper
>> server
>> >> running the same software version and listening on the same IP and
>> >>ports,
>> >> then this new server will join the cluster, synchronize and start
>> >>working
>> >> normally.
>> >> I wouldn't recommend to replace more than one server at a time, and I
>> >> think that it is better if the new server joins while the existing
>> >>quorum
>> >> is stable (avoid leader elections while the new server joins, i.e.
>> avoid
>> >> restarts or disconnections of the existing servers).
>> >>
>> >> Best regards,
>> >>
>> >> Germ=E1n.
>> >>
>> >>
>> >> On Tue, Nov 5, 2013 at 6:42 AM, Bae, Jae Hyeon <metacret@gmail.com>
>> >>wrote:
>> >>
>> >>> Hi
>> >>>
>> >>> I read an article
>> >>>
>> >>>
>> >>>
>> http://www.benhallbenhall.com/2011/07/rolling-restart-in-apache-zookeepe
>> >>>r-to-dynamically-add-servers-to-the-ensemble/
>> >>>
>> >>> My question is, even though failed hardware is replaced with the sam=
e
>> >>>IP
>> >>> address, do I need to do rolling restart for adding replaced hardwar=
e
>> >>>to
>> >>> the quorum?
>> >>>
>> >>> I am using zookeeper ver3.4.5.
>> >>>
>> >>> Thank you
>> >>> Best, Jae
>> >>>
>> >>
>> >>
>>
>>
>

--089e01635356cd876304ea720f32
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I am attaching log file. Could you take a look why the new=
 instance cannot join quorum?</div><div class=3D"gmail_extra"><br><br><div =
class=3D"gmail_quote">On Tue, Nov 5, 2013 at 9:52 AM, Bae, Jae Hyeon <span =
dir=3D"ltr">&lt;<a href=3D"mailto:metacret@gmail.com" target=3D"_blank">met=
acret@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Thanks a lot Ben<div><br></=
div><div>We are also using zookeeper in AWS with elastic IP. Why I asked th=
is question is, when the bad Zookeeper EC2 instance is terminated and new i=
nstance is launched with the previous elastic IP, it cannot join quorum wit=
hout any specific error messages. But when I did rolling restart, the new i=
nstance started normally, synchronized and joined quorum.</div>

<div><br></div><div>As I understand German&#39;s response, the new instance=
 should start, synchronize, and join quorum successfully without any impact=
 on existing instances but it didn&#39;t. I will investigate further.</div>

<div><br></div><div>Thank you</div><div>Best, Jae</div></div><div class=3D"=
HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><br><div class=3D"=
gmail_quote">On Tue, Nov 5, 2013 at 8:24 AM, Ben Hall <span dir=3D"ltr">&lt=
;<a href=3D"mailto:ben@zynga.com" target=3D"_blank">ben@zynga.com</a>&gt;</=
span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hi Jae,<br>
<br>
I wrote that article several years ago. (tbh - I hope it is not totally<br>
out of date by now). =A0I agree with German&#39;s points.<br>
<br>
The issue it was solving was to replace a bad server without having to<br>
shutdown the ensemble and without having to update the config files on<br>
each server. I would also add that this only works as long as the server<br=
>
names and ports are the same - iirc at the time the article was written we<=
br>
were using servers in AWS and referencing them either by assigned<br>
hostnames such as zookeeper-[01|11] or by elastic IP&#39;s that could be mo=
ved<br>
from server to server.<br>
<br>
If I understand your question correctly, if you are &quot;adding a new serv=
er&quot;<br>
such as going from 7 to 9 servers, then this approach won&#39;t benefit you=
 as<br>
you.<br>
<br>
We also used this approach when we would upgrade the servers, but like<br>
German said we did it one server at a time so that the Leader election<br>
could be natural. =A0This allowed us to upgrade a pool of 11 servers who<br=
>
were responsible for many thousands of client connections without any down<=
br>
time.<br>
<br>
Thanks<br>
<span><font color=3D"#888888">Ben<br>
</font></span><div><div><br>
<br>
On 11/5/13 6:51 AM, &quot;German Blanco&quot; &lt;<a href=3D"mailto:german.=
blanco.blanco@gmail.com" target=3D"_blank">german.blanco.blanco@gmail.com</=
a>&gt; wrote:<br>
<br>
&gt;... and make sure that there is no rubbish in the data dir of the new<b=
r>
&gt;server.<br>
&gt;<br>
&gt;<br>
&gt;On Tue, Nov 5, 2013 at 3:49 PM, German Blanco &lt;<br>
&gt;<a href=3D"mailto:german.blanco.blanco@gmail.com" target=3D"_blank">ger=
man.blanco.blanco@gmail.com</a>&gt; wrote:<br>
&gt;<br>
&gt;&gt; Hello Jae,<br>
&gt;&gt;<br>
&gt;&gt; I think that the answer to your question is &quot;no, there is no =
benefit in<br>
&gt;&gt;a<br>
&gt;&gt; rolling restart in that case&quot;.<br>
&gt;&gt; If you remove a machine that was hosting a zookeeper server that w=
as<br>
&gt;&gt;part<br>
&gt;&gt; of a cluster, and replace it with a new machine, with a zookeeper =
server<br>
&gt;&gt; running the same software version and listening on the same IP and=
<br>
&gt;&gt;ports,<br>
&gt;&gt; then this new server will join the cluster, synchronize and start<=
br>
&gt;&gt;working<br>
&gt;&gt; normally.<br>
&gt;&gt; I wouldn&#39;t recommend to replace more than one server at a time=
, and I<br>
&gt;&gt; think that it is better if the new server joins while the existing=
<br>
&gt;&gt;quorum<br>
&gt;&gt; is stable (avoid leader elections while the new server joins, i.e.=
 avoid<br>
&gt;&gt; restarts or disconnections of the existing servers).<br>
&gt;&gt;<br>
&gt;&gt; Best regards,<br>
&gt;&gt;<br>
&gt;&gt; Germ=E1n.<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On Tue, Nov 5, 2013 at 6:42 AM, Bae, Jae Hyeon &lt;<a href=3D"mail=
to:metacret@gmail.com" target=3D"_blank">metacret@gmail.com</a>&gt;<br>
&gt;&gt;wrote:<br>
&gt;&gt;<br>
&gt;&gt;&gt; Hi<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I read an article<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<a href=3D"http://www.benhallbenhall.com/2011/07/rolling-restar=
t-in-apache-zookeepe" target=3D"_blank">http://www.benhallbenhall.com/2011/=
07/rolling-restart-in-apache-zookeepe</a><br>
&gt;&gt;&gt;r-to-dynamically-add-servers-to-the-ensemble/<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; My question is, even though failed hardware is replaced with t=
he same<br>
&gt;&gt;&gt;IP<br>
&gt;&gt;&gt; address, do I need to do rolling restart for adding replaced h=
ardware<br>
&gt;&gt;&gt;to<br>
&gt;&gt;&gt; the quorum?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I am using zookeeper ver3.4.5.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Thank you<br>
&gt;&gt;&gt; Best, Jae<br>
&gt;&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
<br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--089e01635356cd876304ea720f32--

--089e01635356cd876a04ea720f34--