zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérémie BORDIER <jeremie.bord...@gmail.com>
Subject Missing session state handling in most Leader Election implementations
Date Sun, 13 Nov 2011 23:40:02 GMT
Hi Folks,

We have been playing around with ZooKeeper for a few weeks now, and
reading carefully through the documentation I noticed this statement:

If you are using watches, you must look for the connected watch event.
When a ZooKeeper client disconnects from a server, you will not
receive notification of changes until reconnected. If you are watching
for a znode to come into existence, you will miss the event if the
znode is created and deleted while you are disconnected.

As noticed in ZOOKEEPER-1209, this can cause really important issues.
As Leader election is one of the most demanded feature / recipe, I
would really like to see the official recipe fixed and fully

I decided to throw a look at other implementations of the leader
election and surprisingly, none of them seemed to care about the
Disconnected / Expired / SyncConnected events in a simple way. Here's
my quick analysis of what they do, and I'd love to know whether I'm
missing something or if they are really wrong:

Twitter commons library Election recipe is based on their "Group"
implementation, with EPHEMERAL|SEQUENTIAL nodes in the same way of the
official LES algorithm. Looking at the Group impl (
), they handle the Expired event and retry to join / watch, which
makes a getClient() that will recreate the connection if the
connection has expired. This looks fine for the Expired event, but
what about Disconnected / SyncConnected events ? Nothing.

Netflix' curator library has an approach where the leader acquires an
inter process mutex, backed by a group with EPHEMERAL|SEQUENTIAL nodes
also. Netflix's library has a big advantage: It has a built in API for
retrying actions, so leader election will try to acquire the lock, and
retry if anything goes wrong in the middle. In case of any event, the
loop waiting for the lock will be notified, and will retry in case of
any failure, so a Disconnected or Expired event would be handled
properly. On the other side, it seems that once the leader has been
elected, the event just seems to be ignored. This may lead to the same
split brain issue than the original LES example. (see
for details).

Here's all I came up to so far. If you guys have the time to throw a
look to these implementations, I would love to know if I missed
something. So, I think this split brain issue may almost never happen,
but as usually what should never happen hits you hard when you don't
expect it. A robust Leader election implementation would be really
great to have.


View raw message