zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bae, Jae Hyeon" <metac...@gmail.com>
Subject Re: How to join quorum without restarting existing servers
Date Wed, 13 Nov 2013 01:19:59 GMT
Thanks a lot German. Now, I can understand its strange behavior, so we
decided to use IP address itself as a server list, instead of hostname. The
problem went away.


On Wed, Nov 6, 2013 at 8:34 PM, German Blanco <
german.blanco.blanco@gmail.com> wrote:

> Hello again,
>
> I don't think it is a good a idea to start a new thread with the same
> issue. Please continue in the latest thread.
>
> could this be a DNS resolution caching problem?
> See https://issues.apache.org/jira/browse/ZOOKEEPER-1506
>
> The new server has the lowest sid. It is able to connect to all other
> servers, but the rest of the servers don't seem able to connect to it.
> Connections from this server to the rest are useless, since they are
> dropped because of the sid comparison that you see in the log.
>
> You could try to change the server address in the configuration for the AWS
> public IP address of the peers, just to test if that works ok. Or try
> replacing the server with the highest sid, that should also work. Otherwise
> (assuming the problem is DNS resolution), the only current workaround that
> I can think of is the rolling restart, as you have noticed.
>
>
> On Wed, Nov 6, 2013 at 6:39 PM, Diego Oliveira <lokimad@gmail.com> wrote:
>
> > Bae,
> >
> >    Just a note, when using Zookeeper in amazon AWS, the instance IP
> > relocation at restart is a nightmare. One solution is to do as you sad,
> > using an elastic IP, but the max number 5 is limiting. One option is to
> > configure a VPC. I got this problems last year.
> >
> > Att,
> >       Diego.
> >
> >
> > On Tue, Nov 5, 2013 at 4:18 PM, Bae, Jae Hyeon <metacret@gmail.com>
> wrote:
> >
> > > I am attaching log file. Could you take a look why the new instance
> > cannot
> > > join quorum?
> > >
> > >
> > > On Tue, Nov 5, 2013 at 9:52 AM, Bae, Jae Hyeon <metacret@gmail.com>
> > wrote:
> > >
> > >> Thanks a lot Ben
> > >>
> > >> We are also using zookeeper in AWS with elastic IP. Why I asked this
> > >> question is, when the bad Zookeeper EC2 instance is terminated and new
> > >> instance is launched with the previous elastic IP, it cannot join
> quorum
> > >> without any specific error messages. But when I did rolling restart,
> the
> > >> new instance started normally, synchronized and joined quorum.
> > >>
> > >> As I understand German's response, the new instance should start,
> > >> synchronize, and join quorum successfully without any impact on
> existing
> > >> instances but it didn't. I will investigate further.
> > >>
> > >> Thank you
> > >> Best, Jae
> > >>
> > >>
> > >> On Tue, Nov 5, 2013 at 8:24 AM, Ben Hall <ben@zynga.com> wrote:
> > >>
> > >>> Hi Jae,
> > >>>
> > >>> I wrote that article several years ago. (tbh - I hope it is not
> totally
> > >>> out of date by now).  I agree with German's points.
> > >>>
> > >>> The issue it was solving was to replace a bad server without having
> to
> > >>> shutdown the ensemble and without having to update the config files
> on
> > >>> each server. I would also add that this only works as long as the
> > server
> > >>> names and ports are the same - iirc at the time the article was
> written
> > >>> we
> > >>> were using servers in AWS and referencing them either by assigned
> > >>> hostnames such as zookeeper-[01|11] or by elastic IP's that could be
> > >>> moved
> > >>> from server to server.
> > >>>
> > >>> If I understand your question correctly, if you are "adding a new
> > server"
> > >>> such as going from 7 to 9 servers, then this approach won't benefit
> you
> > >>> as
> > >>> you.
> > >>>
> > >>> We also used this approach when we would upgrade the servers, but
> like
> > >>> German said we did it one server at a time so that the Leader
> election
> > >>> could be natural.  This allowed us to upgrade a pool of 11 servers
> who
> > >>> were responsible for many thousands of client connections without any
> > >>> down
> > >>> time.
> > >>>
> > >>> Thanks
> > >>> Ben
> > >>>
> > >>>
> > >>> On 11/5/13 6:51 AM, "German Blanco" <german.blanco.blanco@gmail.com>
> > >>> wrote:
> > >>>
> > >>> >... and make sure that there is no rubbish in the data dir of the
> new
> > >>> >server.
> > >>> >
> > >>> >
> > >>> >On Tue, Nov 5, 2013 at 3:49 PM, German Blanco <
> > >>> >german.blanco.blanco@gmail.com> wrote:
> > >>> >
> > >>> >> Hello Jae,
> > >>> >>
> > >>> >> I think that the answer to your question is "no, there is
no
> benefit
> > >>> in
> > >>> >>a
> > >>> >> rolling restart in that case".
> > >>> >> If you remove a machine that was hosting a zookeeper server
that
> was
> > >>> >>part
> > >>> >> of a cluster, and replace it with a new machine, with a zookeeper
> > >>> server
> > >>> >> running the same software version and listening on the same
IP and
> > >>> >>ports,
> > >>> >> then this new server will join the cluster, synchronize and
start
> > >>> >>working
> > >>> >> normally.
> > >>> >> I wouldn't recommend to replace more than one server at a
time,
> and
> > I
> > >>> >> think that it is better if the new server joins while the
existing
> > >>> >>quorum
> > >>> >> is stable (avoid leader elections while the new server joins,
i.e.
> > >>> avoid
> > >>> >> restarts or disconnections of the existing servers).
> > >>> >>
> > >>> >> Best regards,
> > >>> >>
> > >>> >> Germán.
> > >>> >>
> > >>> >>
> > >>> >> On Tue, Nov 5, 2013 at 6:42 AM, Bae, Jae Hyeon <
> metacret@gmail.com>
> > >>> >>wrote:
> > >>> >>
> > >>> >>> Hi
> > >>> >>>
> > >>> >>> I read an article
> > >>> >>>
> > >>> >>>
> > >>> >>>
> > >>>
> > http://www.benhallbenhall.com/2011/07/rolling-restart-in-apache-zookeepe
> > >>> >>>r-to-dynamically-add-servers-to-the-ensemble/
> > >>> >>>
> > >>> >>> My question is, even though failed hardware is replaced
with the
> > same
> > >>> >>>IP
> > >>> >>> address, do I need to do rolling restart for adding replaced
> > hardware
> > >>> >>>to
> > >>> >>> the quorum?
> > >>> >>>
> > >>> >>> I am using zookeeper ver3.4.5.
> > >>> >>>
> > >>> >>> Thank you
> > >>> >>> Best, Jae
> > >>> >>>
> > >>> >>
> > >>> >>
> > >>>
> > >>>
> > >>
> > >
> >
> >
> > --
> > Att.
> > Diego de Oliveira
> > System Architect
> > diego@diegooliveira.com
> > www.diegooliveira.com
> > Never argue with a fool -- people might not be able to tell the
> difference
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message