zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhao Boran <hulunb...@gmail.com>
Subject Getting confused with the "recipe for lock"
Date Fri, 11 Jan 2013 13:46:12 GMT
While reading the zookeeper's recipe for
I get confused:

Seems that this recipe-for-distributed-lock can not guarantee *"any
snapshot in time no two clients think they hold the same lock"*.

But since zookeeper is so widely adopted, if there were such mistakes in
the reference doc, someone should have pointed it out long time ago.

So, what did I misunderstand? please help me!

Recipe-for-distributed-lock (from


Fully distributed locks that are globally synchronous, *meaning at any
snapshot in time no two clients think they hold the same lock*. These can
be implemented using ZooKeeeper. As with priority queues, first define a
lock node.

   1. Call create( ) with a pathname of "*locknode*/guid-lock-" and the
   sequence and ephemeral flags set.
   2. Call getChildren( ) on the lock node without setting the watch flag
   (this is important to avoid the herd effect).
   3. If the pathname created in step 1 has the lowest sequence number
   suffix, the client has the lock and the client exits the protocol.
   4. The client calls exists( ) with the watch flag set on the path in the
   lock directory with the next lowest sequence number.
   5. if exists( ) returns false, go to step 2. Otherwise, wait for a
   notification for the pathname from the previous step before going to step 2.

Considering the following case:


   Client1 successfully acquired the lock(in step3), with zk node

   Client2 created node "locknode/guid-lock-1", failed to acquire the lock,
   and watching "locknode/guid-lock-0";

   Later, for some reasons(network congestion?), client1 failed to send
   heart beat message to zk cluster on time, but client1 is still perfectly
   working, and assuming itself still holding the lock.

   But, Zookeeper may think client1's session is timeouted, and then
   1. deletes "locknode/guid-lock-0"
      2. sends a notification to Client2 (or send the notification first?)
      3. but can not send "session timeout" notification to client1 in time
      (due to network congestion?)


   Client2 got the notification, goes to step 2, gets the only node
   ""locknode/guid-lock-1", which is created by itself; thus, client2 assumes
   it hold the lock.

   But at the same time, client1 assumes it hold the lock.

Is this a valid scenario?

Thanks a lot!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message