zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Carrino <john.carr...@gmail.com>
Subject Check Leadership on each request
Date Wed, 15 Aug 2012 19:02:15 GMT
I have a service I would like to run with strong consistency guarentees.
 One thing I'd like to avoid is trusting any clock except for the one on
the leader that is doing the session expiration and even then I don't trust
that clock as much as the invalidate sent as a Zab event.  For my highly
consistent service I need to check on every request that my node is still
the leader.

The main thing that needs to be avoided is that two hosts think they are
the leader and serve requests.  Normally in zookeeper this is done by
having the watcher get a ConnectionTimeout event but for my service I need
to ensure there is a "happens after" relationship here.  My clock might be
messed up or suspended due to jvm or the host may be virtualized and put to
sleep or any number of other reasons(
) and I don't want to trust this clock.  This implies I need to make sure
that the ZK leader thinks my session is still valid during the request. It
is ok if my service loses leadership during the call.  The main thing I
require is that there is a happens after relationship between the client
initiating the request and my server checking to see if it is still the

The guarantee I want to provide is that at the time my service receives a
request, no other host has served a request since my service was appointed

Basically the 2 ways I can think of to ensure this is to do a write of the
ephemeral node used for my leadership.  Or issue a sync request and then do
a read to my ephemeral node.

I am pretty sure doing the write will get me the guarantee that I need.

Value doRequest(args) {
  zk.setData(myLockNode, bytes, -1);
  return doRequestInternal();

or this

Value doRequest(args) {
  zk.sync(); // (wait for response)
  zk.exists(myLockNode); // (wait for response)
  return doRequestInternal();

However I am less sure about the sync.  Is it safe to just do a sync and an
exists?  What if the node I am hitting is getting data from the wrong
leader (partitioned from the rest of the nodes)?  Do I have to do a write
to ensure that the current leader actually has a quorum?  Does sync ensure
that the current leader still holds a quorum?

This ensures that I have the lock according to the leader at the time the
request starts. I did some perf benchmarks and was surprised how long the
sync and read call took.  It also seemed slower on spinny disks than on SSD
which I thought was weird because it never needs to hit disk.

My question is can I lower the latency of this check?  Can I issue a
heartbeat command that will make it's way to the leader (ensure the leader
still has a quorum, possibly by pinging it's followers) and back to me and
never have to hit disk?

Something like the following would be great for me:

Value doRequest(args) {
  return doRequestInternal();

I assume the heartbeat will talk to the follower which will forward to the
ZK leader and will return success only if my session is still valid.

If my understanding is correct the heartbeat will never have to hit disk
and should just take 3 network hops of latency (client to follower,
follower to leader, leader to it's followers).  This should be faster than
a sync and read.

Does anyone have an intuition if this will indeed be faster and how
feasible it is?  I checked into the heartbeat code is and it is buried
pretty deep under the covers of the ZooKeeper class with the socket code.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message