curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Bangert <>
Subject Re: CURATOR-3.0 tests
Date Thu, 02 Jun 2016 21:01:35 GMT
On Thu, Jun 2, 2016 at 1:28 PM, Jordan Zimmerman
<> wrote:
> I believe there are two things going on:
> 1) This test uses the infinite versions of the APIs. For some reason, either
> the internal lock or the semaphore code is getting stuck in wait() when
> there’s a network outage and never wakes up. I have some theories I’m
> working on.
> 2) This is in the category of “How Did it Ever Work”. I’m cc’ing Ben Bangert
> because it was his algorithm I used for InterProcessSemaphoreV2 and I want
> to run this past him. In the current implementation
> (
> 363-371), it seems to me that if there are more waiters on semaphores than
> there are available semaphores, it will wait infinitely. My solution is to
> sort the ZNode children and if the index of the acquiring client is less
> than the number of configured max leases, give that client the lease and be
> done. E.g.

I'm not sure how the Curator version works, I can only go over how the
Python Kazoo client works, and it's been awhile so I had to refresh my
memory from the code.

In Kazoo, there's a lock node for a given semaphore, and a lease pool
node, which has a child ephemeral node per lease holder. The only
client allowed to add its ephemeral node to the lease pool node is the
lock holder. Clients that already acquired a lease may delete their
node at anytime to release their lease.

The lock works per the standard lock recipe, so all lock waiters are
in line, and will wake per the standard lock recipe for lease
acquisition fairness.

The client that acquires the lock gets to create a lease node, unless
there's currently as many lease child nodes as the lease pool node
indicates are allowed to have a lease. In which case, it sets a watch
on the lease pool node to wait for a lease child to go away (this was
a crucial difference from curator which had nodes watching specific
lease holding nodes in a sorted line of some sort resulting in
possible lease starvation afaik).

There should be no indefinite waiting since as soon as a lease node is
deleted, the lock holder wakes and gets to create its node (and in my
tests does so).

It sounds like curator is using a different algorithm since it has
nodes sorting their position to determine if they have a lease or not.


View raw message