zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Fines <scottfi...@gmail.com>
Subject Re: question on lock recipe
Date Wed, 20 Jul 2011 21:12:10 GMT
We usually attach a Watcher to the ZooKeeper instance itself to listen for
ConnectionLoss events. When the ConnectionLoss event is received, we either

a). Pause the work, looking for a syncConnected event. If a syncConnected
event fails to be received within a certain timeout (usually the session
expiration timeout - some fudge factor), we assume we've lost the lock and
cancel processing entirely.
b). Preemtively stop the work, and assume from minute 1 that we've lost the

Generally, I prefer b) to a), as it's simpler to code and understand, and
less faught with timing issues. However, if it is VERY expensive to switch
which machine is doing the work, it can be preferable to perform a) instead.
Either way, these solution lie outside of the Lock pattern itself, and with
the application logic itself.

Scott Fines

On Wed, Jul 20, 2011 at 4:03 PM, Will Johnson

> The Lock recipe has a overview description of "Fully distributed locks that
> are globally synchronous, meaning at any snapshot in time no two clients
> think they hold the same lock."  We've implemented this pattern but we've
> run into an issue handling zookeeper errors that seem to violate the
> semantics of 'no two clients think they have the lock.'  for example:
> Thread1.Client1.lock();
> Thread2.Client2.lock();
> // client1 gets the lock so he starts some work
> Thread1.client1.doWork();
> // but now i get a session timeout
> // in the worst case it's because the doWork() method caused a full GC that
> took > sessionTimeout
> // my client then has to reconnect with a new session ID
> Thread1.client1.reconnect();
> But now my question is, how have people handled this case to notify
> Thread1.client1 that he is no longer holding the lock?  Without a lot of
> pedantic calls to Thread1.client1.doIStillHaveTheLock() inside the doWork()
> method it seems like 2 clients both think they have the lock.  Even if you
> make repeated calls to check the state of your lock you still have small
> windows of time where 2 clients are in the lock.  i could interrupt Thread1
> when reconnecting but if you're using the lock for multithreaded
> synchronization that won't help.
> I realize the limitations of zookeeper in this case but i also hope someone
> else has solved this problem intelligently before.
> - will

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message