zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hulunbier <hulunb...@gmail.com>
Subject Re: Getting confused with the "recipe for lock"
Date Sat, 12 Jan 2013 10:30:53 GMT
Thanks Jordan,

> If client1's hearbeat fails its main watcher will get a Disconnect event

Suppose the network link betweens client1 and server is at very low
quality (high packet loss rate?) but still fully functional.

Client1 may be happily sending heart-beat-messages to server without
notice anything; but ZK server could be unable to receive
heart-beat-messages from client1 for a long period of time , which
leads ZK server to timeout client1's session, and delete the ephemeral
node.

Thus, client's session could be timeouted by ZK server, without
triggering a Disconnect event.

>Well behaving ZK applications must watch for this and assume that it no longer holds the
lock and, thus, should delete its node. If client1 needs the lock again it should try to re-acquire
it from step 1 of the recipe. Further, well behaving ZK applications must re-try node deletes
if there is a connection problem. Have a look at Curator's implementation for details.

Thanks for pointing me the "Curator's implementation", I will dig into
the source code.

But I still feels that, no matter how well a ZK application behaves,
if we use ephemeral node in the lock-recipe; we can not guarantee "at
any snapshot in time no two clients think they hold the same lock",
which is the fundamental requirement/constraint for a lock.

Mr. Andrey Stepachev suggested that I should use a timer in client
side to track session_timeout, that sounds reasonable; but I think
this implicitly implies some constrains of clock drift - which I am
not expected in a solution based on Zookeeper (ZK is supposed to keep
the animals well).







On Sat, Jan 12, 2013 at 4:20 AM, Jordan Zimmerman
<jordan@jordanzimmerman.com> wrote:
>
> If client1's hearbeat fails its main watcher will get a Disconnect event. Well behaving
ZK applications must watch for this and assume that it no longer holds the lock and, thus,
should delete its node. If client1 needs the lock again it should try to re-acquire it from
step 1 of the recipe. Further, well behaving ZK applications must re-try node deletes if there
is a connection problem. Have a look at Curator's implementation for details.
>
> -JZ
>
> On Jan 11, 2013, at 5:46 AM, Zhao Boran <hulunbier@gmail.com> wrote:
>
> > While reading the zookeeper's recipe for
> > lock<http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Locks>,
> > I get confused:
> >
> > Seems that this recipe-for-distributed-lock can not guarantee *"any
> > snapshot in time no two clients think they hold the same lock"*.
> >
> > But since zookeeper is so widely adopted, if there were such mistakes in
> > the reference doc, someone should have pointed it out long time ago.
> >
> > So, what did I misunderstand? please help me!
> >
> > Recipe-for-distributed-lock (from
> > http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Locks)
> >
> > Locks
> >
> > Fully distributed locks that are globally synchronous, *meaning at any
> > snapshot in time no two clients think they hold the same lock*. These can
> > be implemented using ZooKeeeper. As with priority queues, first define a
> > lock node.
> >
> >   1. Call create( ) with a pathname of "*locknode*/guid-lock-" and the
> >   sequence and ephemeral flags set.
> >   2. Call getChildren( ) on the lock node without setting the watch flag
> >   (this is important to avoid the herd effect).
> >   3. If the pathname created in step 1 has the lowest sequence number
> >   suffix, the client has the lock and the client exits the protocol.
> >   4. The client calls exists( ) with the watch flag set on the path in the
> >   lock directory with the next lowest sequence number.
> >   5. if exists( ) returns false, go to step 2. Otherwise, wait for a
> >   notification for the pathname from the previous step before going to step 2.
> >
> > Considering the following case:
> >
> >   -
> >
> >   Client1 successfully acquired the lock(in step3), with zk node
> >   "locknode/guid-lock-0";
> >   -
> >
> >   Client2 created node "locknode/guid-lock-1", failed to acquire the lock,
> >   and watching "locknode/guid-lock-0";
> >   -
> >
> >   Later, for some reasons(network congestion?), client1 failed to send
> >   heart beat message to zk cluster on time, but client1 is still perfectly
> >   working, and assuming itself still holding the lock.
> >   -
> >
> >   But, Zookeeper may think client1's session is timeouted, and then
> >   1. deletes "locknode/guid-lock-0"
> >      2. sends a notification to Client2 (or send the notification first?)
> >      3. but can not send "session timeout" notification to client1 in time
> >      (due to network congestion?)
> >
> >
> >   -
> >
> >   Client2 got the notification, goes to step 2, gets the only node
> >   ""locknode/guid-lock-1", which is created by itself; thus, client2 assumes
> >   it hold the lock.
> >   -
> >
> >   But at the same time, client1 assumes it hold the lock.
> >
> > Is this a valid scenario?
> >
> > Thanks a lot!
>

Mime
View raw message