zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Bangert <...@groovie.org>
Subject Re: Getting confused with the "recipe for lock"
Date Sat, 12 Jan 2013 17:39:43 GMT
On Jan 12, 2013, at 2:30 AM, Hulunbier <hulunbier@gmail.com> wrote:

> Suppose the network link betweens client1 and server is at very low
> quality (high packet loss rate?) but still fully functional.
> Client1 may be happily sending heart-beat-messages to server without
> notice anything; but ZK server could be unable to receive
> heart-beat-messages from client1 for a long period of time , which
> leads ZK server to timeout client1's session, and delete the ephemeral
> node.

If the ZK server doesn't get the ping, then it won't reply to it, and the client *should be*
expecting a ping reply. However, it occurs to me that I don't actually check in my Python
implementation that I actually *get* a ping reply, nor does the Java client afaik. The scenario
below is what will actually happen on the ZK server though, so the client will react appropriately

> Thus, client's session could be timeouted by ZK server, without
> triggering a Disconnect event.

Then the ZK server will tear down the connection, the client will definitely notice the other
side of the TCP connection going down and should react appropriately.

>> Well behaving ZK applications must watch for this and assume that it no longer holds
the lock and, thus, should delete its node. If client1 needs the lock again it should try
to re-acquire it from step 1 of the recipe. Further, well behaving ZK applications must re-try
node deletes if there is a connection problem. Have a look at Curator's implementation for
> Thanks for pointing me the "Curator's implementation", I will dig into
> the source code.
> But I still feels that, no matter how well a ZK application behaves,
> if we use ephemeral node in the lock-recipe; we can not guarantee "at
> any snapshot in time no two clients think they hold the same lock",
> which is the fundamental requirement/constraint for a lock.

If both clients fetch the children, using ephemeral sequential nodes guarantees that every
node has a sequence number appended, and only one will have the lowest number. At the *very
least* the lock holder knows that it has the lock for the amount of time equal to the session
expiration, if the connection is torn down or becomes otherwise disconnected the client will
generate an event that code should listen for to react appropriately when there is no longer
a guarantee that the lock is held.

> Mr. Andrey Stepachev suggested that I should use a timer in client
> side to track session_timeout, that sounds reasonable; but I think
> this implicitly implies some constrains of clock drift - which I am
> not expected in a solution based on Zookeeper (ZK is supposed to keep
> the animals well).

An alternative implementation that would alleviate this worry (though introduce the risk of
dead-locks) would be to not use ephemeral sequential nodes, and just sequential nodes. This
means that a lock would *never* be released until the client releases it, which might be more
appropriate for you if this lock is governing something so important. Though you will of course
need something else to alert you if its possible you're in a dead-lock scenario (client dies
without releasing lock).

||   Ben Bangert                                                 ||
||   ben@groovie.org                                             ||
||   http://be.groovie.org/                                      ||

View raw message