curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: InterProcessMutex acquire times out and then _succeeds_?
Date Thu, 17 Oct 2013 15:08:01 GMT
This sounds suspiciously like a misuse of a watcher somewhere. Are you doing other things with
ZooKeeper? ZooKeeper watches are single threaded. If you have a watcher that blocks, etc.
it will prevent other ZooKeeper recipes from functioning. See this Tech Note: https://cwiki.apache.org/confluence/display/CURATOR/TN1

Other than that, I haven't heard of this problem. If you can provide a test case, that would
help.

-Jordan

On Oct 17, 2013, at 7:47 AM, Chris Jeris <cjeris@brightcove.com> wrote:

> We have run into a knotty problem with InterProcessMutex, where calls to .acquire will
expend their full timeout and then _succeed_ in acquiring the lock.  This generally happens
when a different server process was the last one to hold the lock, but it does not happen
every time that is the case.
> 
> The lock is a single InterProcessMutex object per server process (= server machine),
all on a single persistent Zookeeper node name (the object being access controlled is a single
piece of shared state, whose data is not itself stored in ZK).  The problem arises in the
context of a test suite where requests to our server cluster are issued serially, so there
is basically no competing traffic on the lock, although there is traffic on the ZK cluster
from other applications.  The frequency of acquires on this lock is not excessive (order 1
per second), and we are reasonably certain our client code is not holding the lock longer
than it should.
> 
> The problem does not seem to be sensitive to the exact value of the timeout.  If we set
it to 15 seconds, we see lock acquires taking 15 seconds and then succeeding; if we set it
to 60 seconds, we see them taking 60 seconds and then succeeding.
> 
> Right now we observe the problem with Curator 2.1.0 against both ZK 3.3.6 and 3.4.5.
> 
> Is this a known or familiar issue?  Does it sound like we're doing something wrong?
> 
> thanks, Chris Jeris
> -- 
> Chris Jeris
> cjeris@brightcove.com
> freenode/twitter/github: ystael


Mime
View raw message