mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donald Laidlaw <>
Subject Re: Zookeeper cluster changes
Date Tue, 10 Nov 2015 19:23:37 GMT
I agree, you want to apply the changes gradually so as not to lose a quorum. The problem is
automating this so that it happens in a lights-out environment, in the cloud, without some
poor slob's pager going off in the middle of the night :)

While health checks can detect and replace a dead server reliably on any number of clouds,
the new server comes up with a new IP address. This server can reliably join into zookeeper
ensemble. However, it is tough to automate the rolling restart of the other mesos servers,
both masters and slaves, that needs to occur to keep them happy. 

One thing I have not tried is to just ignore the change, and use something to detect the masters
just prior to starting mesos. If they truly fail fast, then if they lose a zookeeper connection,
then maybe they don’t care that they have been started with an out-of-date list of zookeeper

What does mesos-master and mesos-slave do with a list of zookeeper servers to connect to?
Just try them in order until one works, then use that one until it fails? If so, and it fails
fast, then letting it continue to run with a stale list will have no ill effects. Or does
it keep trying the servers in the list when a connection fails? 

Don Laidlaw

> On Nov 10, 2015, at 4:42 AM, Erik Weathers <> wrote:
> Keep in mind that mesos is designed to "fail fast".  So when there are problems (such
as losing connectivity to the resolved ZooKeeper IP) the daemon(s) (master & slave) die.
> Due to this design, we are all supposed to run the mesos daemons under "supervision",
which means auto-restart after they crash.  This can be done with monit/god/runit/etc.
> So, to perform maintenance on ZooKeeper, I would firstly ensure the mesos-master processes
are running under "supervision" so that they restart quickly after a ZK connectivity failure
occurs.  Then proceed with standard ZooKeeper maintenance (exhibitor-based or manual), pausing
between downing of ZK servers to ensure you have "enough" mesos-master processes running.
 (I *would* say a "pausing until you have a quorum of mesos-masters up", but if you only have
2 of 3 up and then take down the ZK that the leader is connected to, that would be temporarily
bad.  So I'd make sure they're all up.)
> - Erik
> On Mon, Nov 9, 2015 at 11:07 PM, Marco Massenzio < <>>
> The way I would do it in a production cluster would be *not* to use directly IP addresses
for the ZK ensemble, but instead rely on some form of internal DNS and use internally-resolvable
hostnames (eg, {zk1, zk2, ...} <> etc) and
have the provisioning tooling (Chef, Puppet, Ansible, what have you) handle the setting of
the hostname when restarting/replacing a failing/crashed ZK server.
> This way your list of zk's to Mesos never changes, even though the FQN's will map to
different IPs / VMs.
> Obviously, this may not be always desirable / feasible (eg, if your prod environment
does not support DNS resolution).
> You are correct in that Mesos does not currently support dynamically changing the ZK's
addresses, but I don't know whether that's a limitation of Mesos code or of the ZK C++ client
> I'll look into it and let you know what I find (if anything).
> --
> Marco Massenzio
> Distributed Systems Engineer
> <>
> On Mon, Nov 9, 2015 at 6:01 AM, Donald Laidlaw < <>>
> How do mesos masters and slaves react to zookeeper cluster changes? When the masters
and slaves start they are given a set of addresses to connect to zookeeper. But over time,
one of those zookeepers fails, and is replaced by a new server at a new address. How should
this be handled in the mesos servers?
> I am guessing that mesos does not automatically detect and react to that change. But
obviously we should do something to keep the mesos servers happy as well. What should be do?
> The obvious thing is to stop the mesos servers, one at a time, and restart them with
the new configuration. But it would be really nice to be able to do this dynamically without
restarting the server. After all, coordinating a rolling restart is a fairly hard job.
> Any suggestions or pointers?
> Best regards,
> Don Laidlaw

View raw message