zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Cabaj <marcin.ca...@datasift.com>
Subject Re: how to fix messed up servers id
Date Wed, 19 Feb 2014 14:19:54 GMT
Hi German,

thanks your your help. I was just curious if it is possible (and looks like
it will be in 3.5.0 using reconfig).
My origin problem (one ZK service was down) is resolved now. Thanks!

-- 
mc


On Wed, Feb 19, 2014 at 1:47 PM, German Blanco <
german.blanco.blanco@gmail.com> wrote:

> Perhaps there is no way to do what you want.
> you have 3 servers and you want to update the configuration in all of them
> so that the ids are changed. Whatever you do, you will need to restart each
> of the servers in order for them to see the new configuration. Say you
> start by updating one server, if you change the sid it will not be able to
> connect to the other two according to your observation, so there will be an
> ensemble of two running. Now you either go back to the initial state or
> restart a server of the running ensemble.
> Besides, what is the practical point in this? Server id is only an
> arbitrary number to identify servers.
>
>
>
> On Wed, Feb 19, 2014 at 1:51 PM, Marcin Cabaj <marcin.cabaj@datasift.com
> >wrote:
>
> > Hi,
> >
> > Thank you for your answer.
> >
> > Zookeeper version I'm doing tests on : 3.4.5-1392090, built on 09/30/2012
> > 17:52 GMT
> > Reconfig feature will be available in version: 3.5.0.
> >
> > Yes, you are right with rolling restart, but while I restart second
> > service, whole cluster will be down.
> > I'm trying to avoid downtime.
> >
> > --
> > cheers
> > mc
> >
> >
> >
> > On Wed, Feb 19, 2014 at 12:31 PM, German Blanco <
> > german.blanco.blanco@gmail.com> wrote:
> >
> > > So servers do check the sid in the server list. Sorry about that. Maybe
> > you
> > > are running trunk and I work mainly with 3.4.5, or maybe I was
> completely
> > > wrong.
> > > Why don't you just update all files first, and then do a rolling
> restart?
> > > The first server that is restarted will not be able to join the quorum,
> > but
> > > hopefully when you restart the second it will form a quorum with the
> > first
> > > and then when you restart the third everything is back to normal.
> > > Or try the reconfig feature? I haven't tried it myself, but it should
> be
> > > more or less the same as updating the files.
> > >
> > >
> > > On Wed, Feb 19, 2014 at 1:08 PM, Marcin Cabaj <
> marcin.cabaj@datasift.com
> > > >wrote:
> > >
> > > > Ok, I fixed it,
> > > >
> > > > the only thing I changed was zoo.conf
> > > > server.1=zoo0:2888:3888
> > > > server.0=zoo1:2888:3888
> > > > and restarted zoo0.
> > > >
> > > > In the meantime I've created test ensemble, to test 'changing' server
> > ID,
> > > > my start configuration:
> > > >
> > > > zoo1.conf, zoo2.conf: zoo3.conf:
> > > > server.41=localhost:2888:3888
> > > > server.42=localhost:2889:3889
> > > > server.43=localhost:2890:3890
> > > >
> > > > zoo1/myid = 41
> > > > zoo2/myid = 42
> > > > zoo3/myid = 43
> > > >
> > > > zoo3 is the LEADER
> > > >
> > > > At this moment, I can't change server id of followers:(
> > > > What I do:
> > > > 1) change zoo1/myid = 1
> > > > 2) change zoo1.conf:
> > > > server.1=localhost:2888:3888
> > > > server.42=localhost:2889:3889
> > > > server.43=localhost:2890:3890
> > > > 3) restart zoo1
> > > >
> > > > in logs I see that LEADER complains about invalid server id: 1
> > > > 2014-02-19 12:05:50,534 [myid:43] - WARN  [localhost/127.0.0.1:3890
> > > > :QuorumCnxManager@344] - Invalid server id: 1
> > > >
> > > > Question: How to change server ID one of the servers without shutting
> > > down
> > > > whole ensemble?
> > > >
> > > > --
> > > > cheers
> > > > mc
> > > >
> > > >
> > > >
> > > > On Tue, Feb 18, 2014 at 3:54 PM, German Blanco <
> > > > german.blanco.blanco@gmail.com> wrote:
> > > >
> > > > > Leave it as it is. Servers do no check if the sid from another
> server
> > > is
> > > > in
> > > > > that list ... At least I believe they don't, and my experience so
> far
> > > > > confirms it. And if they did it strictly, you wouldn't have reached
> > > your
> > > > > current state.
> > > > >
> > > > > On Tuesday, February 18, 2014, Marcin Cabaj <
> > marcin.cabaj@datasift.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Thanks, will try it tomorrow.
> > > > > > One thing I'm wondering, if I set zoo0 id to eg 5, should I
> update
> > > > > zoo.cfg
> > > > > > on other servers?
> > > > > > If so restart is needed as well right? It will crash my cluster.
> Or
> > > > just
> > > > > > leave zoo.cfg as is?
> > > > > >
> > > > > > --
> > > > > > cheers
> > > > > > mc
> > > > > >
> > > > > >
> > > > > > On Tue, Feb 18, 2014 at 1:41 PM, German Blanco <
> > > > > > german.blanco.blanco@gmail.com <javascript:;>> wrote:
> > > > > >
> > > > > > > For this step:
> > > > > > > "Set a different id in the myid file of server 0 (the one
that
> is
> > > > > down),
> > > > > > > restart it, verify that it joins the quorum." any value
that is
> > not
> > > > > used
> > > > > > > should do, e.g. 3, 4, 5, 1231 ...
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Feb 18, 2014 at 12:04 PM, German Blanco <
> > > > > > > german.blanco.blanco@gmail.com <javascript:;>>
wrote:
> > > > > > >
> > > > > > > > Hello!
> > > > > > > > Set a different id in the myid file of server 0 (the
one that
> > is
> > > > > down),
> > > > > > > > restart it, verify that it joins the quorum.
> > > > > > > > If it joins the quorum, set the myid value in server
1 to
> one,
> > > > > restart
> > > > > > > it,
> > > > > > > > verify that it joins the quorum.
> > > > > > > > If it joins the quorum, update again the myid file
of server
> 0,
> > > > this
> > > > > > time
> > > > > > > > to the correct 0 value. Restart, verify that it all
works.
> > > > > > > >
> > > > > > > > If any of the steps fails, stop and think it all over
again.
> > > > > > > >
> > > > > > > > Good luck.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tuesday, February 18, 2014, Marcin Cabaj <
> > > > > marcin.cabaj@datasift.com<javascript:;>
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi all,
> > > > > > > >>
> > > > > > > >> My ZooKeeper ensemble contains 3 servers, unfortunately
> > somehow
> > > > > > servers
> > > > > > > >> ids
> > > > > > > >> have been messed up.
> > > > > > > >>
> > > > > > > >> zoo.cfg on all servers:
> > > > > > > >> server.0=zoo0:2888:3888
> > > > > > > >> server.1=zoo1:2888:3888
> > > > > > > >> server.2=zoo2:2888:3888
> > > > > > > >>
> > > > > > > >> but:
> > > > > > > >> on ZOO0:
> > > > > > > >> [xxx@zoo0]$ cat /var/zookeeper/myid
> > > > > > > >> 1
> > > > > > > >> [xxx@zoo0]$ echo conf | nc localhost 2181
> > > > > > > >> This ZooKeeper instance is not currently serving
requests
> > > > > > > >>
> > > > > > > >> on ZOO1:
> > > > > > > >> [xxx@zoo1] $ cat /var/zookeeper/myid
> > > > > > > >> 0
> > > > > > > >> [xxx@zoo1:~]$ echo conf | nc localhost 2181 |
grep serverId
> > > > > > > >>
> > > > > > > >> serverId=0
> > > > > > > >>
> > > > > > > >> on ZOO2:
> > > > > > > >> [xxx@zoo2:~]$ cat /var/zookeeper/myid
> > > > > > > >> 2
> > > > > > > >> [xxx@zoo2:~]$ echo conf | nc localhost 2181 |
grep serverId
> > > > > > > >> serverId=2
> > > > > > > >>
> > > > > > > >> How to fix this without shutting down whole ensemble?
> > > > > > > >> Currently I have connections established to ZOO1
and ZOO2.
> > > > > > > >> ZOO0 is listening on 2181 but doesn't accept connections.
> > > > > > > >> ZOO2 is the leader.
> > > > > > > >>
> > > > > > > >> Zookeeper version: 3.3.5-cdh3u5--1, built on 10/06/2012
> 01:58
> > > GMT
> > > > > > > >>
> > > > > > > >> Cheers
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message