Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: pass (nike.apache.org: domain of hulunbier@gmail.com designates
 209.85.216.179 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <A606B17F-B1D5-4C87-8772-B31F6F7C5777@jordanzimmerman.com>
References: 
 <CAESCfHgABaHxNCkU5CrDognsCPW-B8YTYnFiyxnJd2yQkiP7YA@mail.gmail.com>
	<A606B17F-B1D5-4C87-8772-B31F6F7C5777@jordanzimmerman.com>
Date: Sat, 12 Jan 2013 18:30:53 +0800
Message-ID: 
 <CAESCfHg=pJK3z1x_D=v8vtfQunYRBMycWmwVQRdpf8y_e5V4ZA@mail.gmail.com>
Subject: Re: Getting confused with the "recipe for lock"
From: Hulunbier <hulunbier@gmail.com>
To: user@zookeeper.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks Jordan,

> If client1's hearbeat fails its main watcher will get a Disconnect event

Suppose the network link betweens client1 and server is at very low
quality (high packet loss rate?) but still fully functional.

Client1 may be happily sending heart-beat-messages to server without
notice anything; but ZK server could be unable to receive
heart-beat-messages from client1 for a long period of time , which
leads ZK server to timeout client1's session, and delete the ephemeral
node.

Thus, client's session could be timeouted by ZK server, without
triggering a Disconnect event.

>Well behaving ZK applications must watch for this and assume that it no lo=
nger holds the lock and, thus, should delete its node. If client1 needs the=
 lock again it should try to re-acquire it from step 1 of the recipe. Furth=
er, well behaving ZK applications must re-try node deletes if there is a co=
nnection problem. Have a look at Curator's implementation for details.

Thanks for pointing me the "Curator's implementation", I will dig into
the source code.

But I still feels that, no matter how well a ZK application behaves,
if we use ephemeral node in the lock-recipe; we can not guarantee "at
any snapshot in time no two clients think they hold the same lock",
which is the fundamental requirement/constraint for a lock.

Mr. Andrey Stepachev suggested that I should use a timer in client
side to track session_timeout, that sounds reasonable; but I think
this implicitly implies some constrains of clock drift - which I am
not expected in a solution based on Zookeeper (ZK is supposed to keep
the animals well).


On Sat, Jan 12, 2013 at 4:20 AM, Jordan Zimmerman
<jordan@jordanzimmerman.com> wrote:
>
> If client1's hearbeat fails its main watcher will get a Disconnect event.=
 Well behaving ZK applications must watch for this and assume that it no lo=
nger holds the lock and, thus, should delete its node. If client1 needs the=
 lock again it should try to re-acquire it from step 1 of the recipe. Furth=
er, well behaving ZK applications must re-try node deletes if there is a co=
nnection problem. Have a look at Curator's implementation for details.
>
> -JZ
>
> On Jan 11, 2013, at 5:46 AM, Zhao Boran <hulunbier@gmail.com> wrote:
>
> > While reading the zookeeper's recipe for
> > lock<http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Lock=
s>,
> > I get confused:
> >
> > Seems that this recipe-for-distributed-lock can not guarantee *"any
> > snapshot in time no two clients think they hold the same lock"*.
> >
> > But since zookeeper is so widely adopted, if there were such mistakes i=
n
> > the reference doc, someone should have pointed it out long time ago.
> >
> > So, what did I misunderstand? please help me!
> >
> > Recipe-for-distributed-lock (from
> > http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Locks)
> >
> > Locks
> >
> > Fully distributed locks that are globally synchronous, *meaning at any
> > snapshot in time no two clients think they hold the same lock*. These c=
an
> > be implemented using ZooKeeeper. As with priority queues, first define =
a
> > lock node.
> >
> >   1. Call create( ) with a pathname of "*locknode*/guid-lock-" and the
> >   sequence and ephemeral flags set.
> >   2. Call getChildren( ) on the lock node without setting the watch fla=
g
> >   (this is important to avoid the herd effect).
> >   3. If the pathname created in step 1 has the lowest sequence number
> >   suffix, the client has the lock and the client exits the protocol.
> >   4. The client calls exists( ) with the watch flag set on the path in =
the
> >   lock directory with the next lowest sequence number.
> >   5. if exists( ) returns false, go to step 2. Otherwise, wait for a
> >   notification for the pathname from the previous step before going to =
step 2.
> >
> > Considering the following case:
> >
> >   -
> >
> >   Client1 successfully acquired the lock(in step3), with zk node
> >   "locknode/guid-lock-0";
> >   -
> >
> >   Client2 created node "locknode/guid-lock-1", failed to acquire the lo=
ck,
> >   and watching "locknode/guid-lock-0";
> >   -
> >
> >   Later, for some reasons(network congestion?), client1 failed to send
> >   heart beat message to zk cluster on time, but client1 is still perfec=
tly
> >   working, and assuming itself still holding the lock.
> >   -
> >
> >   But, Zookeeper may think client1's session is timeouted, and then
> >   1. deletes "locknode/guid-lock-0"
> >      2. sends a notification to Client2 (or send the notification first=
?)
> >      3. but can not send "session timeout" notification to client1 in t=
ime
> >      (due to network congestion?)
> >
> >
> >   -
> >
> >   Client2 got the notification, goes to step 2, gets the only node
> >   ""locknode/guid-lock-1", which is created by itself; thus, client2 as=
sumes
> >   it hold the lock.
> >   -
> >
> >   But at the same time, client1 assumes it hold the lock.
> >
> > Is this a valid scenario?
> >
> > Thanks a lot!
>