mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shuai Lin <>
Subject About MESOS-1806 (Etcd as an alternative to zookeeper)
Date Wed, 20 Jan 2016 06:42:26 GMT
Hi Benjamin and all,

I'd like to talk about MESOS-1806. Since I took this ticket from halfway,
and there was no design doc for it, I have created one based on the current

Besides, there some details I'd like to discuss:

1. Etcd servers wound't accept requests from clients during the leader
election phase. So when there is a leader re-election among the etcd
servers, the request from the current master to renew the timestamp of the
v2/keys/mesos node would fail, and the current code would immediately retry
with the next server, which would refuse the request as well. Thus the
master would exit due to all servers fail its requests. The same happens
with slaves – detector would fail after requests to all the etcd servers
are refused. To solve this, we should add logic to wait for a while before
trying the next server.

2. If the the current master somehow fails to update the v2/keys/mesos node
in time, that node would expire, the detector would detect this, commit
suicide due to lost of leadership. This is correct behavior, but the
current TTL is kind of small: only 5 seconds, and the current master is set
to update the node at 80% of the TTL, i.e. the 4th second, so the chance of
this problem is not that low, e.g. if there happens ephemeral network
problem. This can be achieved by increase the TTL to 10 seconds, and let
the current master try to update the etcd node at 60% of the TTL.

3. The current implementation requires the list of masters to be specified
in the "--masters=..." flag (used in the replicated logs quorum), this
makes it inconvenient to add new masters to the cluster: every existing
master must be restarted with updated "--masters=" flag. What about create
a directory in etcd key space, and let each master create a child node in
that directory?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message