Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: pass (athena.apache.org: domain of metacret@gmail.com designates
 74.125.83.47 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAEH-zfqJ-jRUSB7GZj7FKb-wnsZY1aft+_GMq2d+3oL3CRS+Cw@mail.gmail.com>
References: 
 <CAEH-zfo8Byx6+-Dc4aMn2rB=PBhiNtzRDfiEKfCowkKAjnWh-g@mail.gmail.com>
	<CE9E594E.8CF38%ben@zynga.com>
	<CAKe7ALfXpSigMs=OwzinJjXtAwG6hxj3S8aAKOW60BPsZVKxTg@mail.gmail.com>
	<CAKe7ALcMkV8xFJmaUzcc3uHo0ZdakF6Da1V_7vGddzKRY0ThyQ@mail.gmail.com>
	<CAFOSzKHzHnjjNm1P6=YvUMxEVOO4exADJA0u_+uFGcs9b6qhyg@mail.gmail.com>
	<CAEH-zfqJ-jRUSB7GZj7FKb-wnsZY1aft+_GMq2d+3oL3CRS+Cw@mail.gmail.com>
Date: Tue, 12 Nov 2013 17:19:59 -0800
Message-ID: 
 <CAKe7ALeVFO4NKfNnoY7btS04EuDx-3efw+MwCVrpMkEjR-exhA@mail.gmail.com>
Subject: Re: How to join quorum without restarting existing servers
From: "Bae, Jae Hyeon" <metacret@gmail.com>
To: user@zookeeper.apache.org
Content-Type: multipart/alternative; boundary=089e0160c5cc8c316304eb04c368

--089e0160c5cc8c316304eb04c368
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks a lot German. Now, I can understand its strange behavior, so we
decided to use IP address itself as a server list, instead of hostname. The
problem went away.


On Wed, Nov 6, 2013 at 8:34 PM, German Blanco <
german.blanco.blanco@gmail.com> wrote:

> Hello again,
>
> I don't think it is a good a idea to start a new thread with the same
> issue. Please continue in the latest thread.
>
> could this be a DNS resolution caching problem?
> See https://issues.apache.org/jira/browse/ZOOKEEPER-1506
>
> The new server has the lowest sid. It is able to connect to all other
> servers, but the rest of the servers don't seem able to connect to it.
> Connections from this server to the rest are useless, since they are
> dropped because of the sid comparison that you see in the log.
>
> You could try to change the server address in the configuration for the A=
WS
> public IP address of the peers, just to test if that works ok. Or try
> replacing the server with the highest sid, that should also work. Otherwi=
se
> (assuming the problem is DNS resolution), the only current workaround tha=
t
> I can think of is the rolling restart, as you have noticed.
>
>
> On Wed, Nov 6, 2013 at 6:39 PM, Diego Oliveira <lokimad@gmail.com> wrote:
>
> > Bae,
> >
> >    Just a note, when using Zookeeper in amazon AWS, the instance IP
> > relocation at restart is a nightmare. One solution is to do as you sad,
> > using an elastic IP, but the max number 5 is limiting. One option is to
> > configure a VPC. I got this problems last year.
> >
> > Att,
> >       Diego.
> >
> >
> > On Tue, Nov 5, 2013 at 4:18 PM, Bae, Jae Hyeon <metacret@gmail.com>
> wrote:
> >
> > > I am attaching log file. Could you take a look why the new instance
> > cannot
> > > join quorum?
> > >
> > >
> > > On Tue, Nov 5, 2013 at 9:52 AM, Bae, Jae Hyeon <metacret@gmail.com>
> > wrote:
> > >
> > >> Thanks a lot Ben
> > >>
> > >> We are also using zookeeper in AWS with elastic IP. Why I asked this
> > >> question is, when the bad Zookeeper EC2 instance is terminated and n=
ew
> > >> instance is launched with the previous elastic IP, it cannot join
> quorum
> > >> without any specific error messages. But when I did rolling restart,
> the
> > >> new instance started normally, synchronized and joined quorum.
> > >>
> > >> As I understand German's response, the new instance should start,
> > >> synchronize, and join quorum successfully without any impact on
> existing
> > >> instances but it didn't. I will investigate further.
> > >>
> > >> Thank you
> > >> Best, Jae
> > >>
> > >>
> > >> On Tue, Nov 5, 2013 at 8:24 AM, Ben Hall <ben@zynga.com> wrote:
> > >>
> > >>> Hi Jae,
> > >>>
> > >>> I wrote that article several years ago. (tbh - I hope it is not
> totally
> > >>> out of date by now).  I agree with German's points.
> > >>>
> > >>> The issue it was solving was to replace a bad server without having
> to
> > >>> shutdown the ensemble and without having to update the config files
> on
> > >>> each server. I would also add that this only works as long as the
> > server
> > >>> names and ports are the same - iirc at the time the article was
> written
> > >>> we
> > >>> were using servers in AWS and referencing them either by assigned
> > >>> hostnames such as zookeeper-[01|11] or by elastic IP's that could b=
e
> > >>> moved
> > >>> from server to server.
> > >>>
> > >>> If I understand your question correctly, if you are "adding a new
> > server"
> > >>> such as going from 7 to 9 servers, then this approach won't benefit
> you
> > >>> as
> > >>> you.
> > >>>
> > >>> We also used this approach when we would upgrade the servers, but
> like
> > >>> German said we did it one server at a time so that the Leader
> election
> > >>> could be natural.  This allowed us to upgrade a pool of 11 servers
> who
> > >>> were responsible for many thousands of client connections without a=
ny
> > >>> down
> > >>> time.
> > >>>
> > >>> Thanks
> > >>> Ben
> > >>>
> > >>>
> > >>> On 11/5/13 6:51 AM, "German Blanco" <german.blanco.blanco@gmail.com=
>
> > >>> wrote:
> > >>>
> > >>> >... and make sure that there is no rubbish in the data dir of the
> new
> > >>> >server.
> > >>> >
> > >>> >
> > >>> >On Tue, Nov 5, 2013 at 3:49 PM, German Blanco <
> > >>> >german.blanco.blanco@gmail.com> wrote:
> > >>> >
> > >>> >> Hello Jae,
> > >>> >>
> > >>> >> I think that the answer to your question is "no, there is no
> benefit
> > >>> in
> > >>> >>a
> > >>> >> rolling restart in that case".
> > >>> >> If you remove a machine that was hosting a zookeeper server that
> was
> > >>> >>part
> > >>> >> of a cluster, and replace it with a new machine, with a zookeepe=
r
> > >>> server
> > >>> >> running the same software version and listening on the same IP a=
nd
> > >>> >>ports,
> > >>> >> then this new server will join the cluster, synchronize and star=
t
> > >>> >>working
> > >>> >> normally.
> > >>> >> I wouldn't recommend to replace more than one server at a time,
> and
> > I
> > >>> >> think that it is better if the new server joins while the existi=
ng
> > >>> >>quorum
> > >>> >> is stable (avoid leader elections while the new server joins, i.=
e.
> > >>> avoid
> > >>> >> restarts or disconnections of the existing servers).
> > >>> >>
> > >>> >> Best regards,
> > >>> >>
> > >>> >> Germ=E1n.
> > >>> >>
> > >>> >>
> > >>> >> On Tue, Nov 5, 2013 at 6:42 AM, Bae, Jae Hyeon <
> metacret@gmail.com>
> > >>> >>wrote:
> > >>> >>
> > >>> >>> Hi
> > >>> >>>
> > >>> >>> I read an article
> > >>> >>>
> > >>> >>>
> > >>> >>>
> > >>>
> > http://www.benhallbenhall.com/2011/07/rolling-restart-in-apache-zookeep=
e
> > >>> >>>r-to-dynamically-add-servers-to-the-ensemble/
> > >>> >>>
> > >>> >>> My question is, even though failed hardware is replaced with th=
e
> > same
> > >>> >>>IP
> > >>> >>> address, do I need to do rolling restart for adding replaced
> > hardware
> > >>> >>>to
> > >>> >>> the quorum?
> > >>> >>>
> > >>> >>> I am using zookeeper ver3.4.5.
> > >>> >>>
> > >>> >>> Thank you
> > >>> >>> Best, Jae
> > >>> >>>
> > >>> >>
> > >>> >>
> > >>>
> > >>>
> > >>
> > >
> >
> >
> > --
> > Att.
> > Diego de Oliveira
> > System Architect
> > diego@diegooliveira.com
> > www.diegooliveira.com
> > Never argue with a fool -- people might not be able to tell the
> difference
> >
>

--089e0160c5cc8c316304eb04c368--