curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject Re: Reaper/ChildReaper usage
Date Mon, 06 Apr 2015 21:39:40 GMT
I’ve still been too slogged to test this yet, but a second question has arisen.

Imagine this scenario:

0. Single CuratorFramework static instance

Zookeeper ensemble explodes

ConnectionListener detects this and sets an AtomicBoolean to True

The AtomicBoolean is reset to false once ConnectionListener gets a connection again.

In the meantime attempts to get/release locks check the Boolean and throw an exception.

The LockFactory also detects this and instead of handing out a distributed lock implementation
hands out a jvm locking. It switches back once the connection is reestablished, and many warnings
are logged.

The idea was that we would fall back to jvm specific locking during this period (and then
switch back). [Yes, a few dubious aspects]

Anyway, this is what happened in staging:

Ensemble exploded

ConnectionListener detected, set AtomicBoolean to true

The logs show the locks continued to try to get acquired - FOREVER! (well at least 30-40 mins)

Restarting the zookeeper ensemble led to zookeeper complaining about more than max 60 clients.

Only restarting the box with the distributed locking finally recovered.

Is there no way to have a lock.acquire() finally give up on connection loss? Even the timed
lock.acquire doesn’t exit.

From: Jordan Zimmerman
Sent: ‎Tuesday‎, ‎March‎ ‎31‎, ‎2015 ‎11‎:‎26‎ ‎AM

OK - if you don’t mind, please build from source and see if it fixes your issue.


On March 31, 2015 at 1:26:02 PM, ( wrote:

FYI Looks from Github that this was not merged until after the 2.7.1 release.

From: Jordan Zimmerman
Sent: ‎Tuesday‎, ‎March‎ ‎31‎, ‎2015 ‎10‎:‎32‎ ‎AM

It looks like CURATOR-173 was possibly released in Curator 2.7.1. Scott Blum needs to respond
on this.


On March 31, 2015 at 12:31:54 PM, ( wrote:

Confirmed btw that InterprocessMutex is reaped properly. However locks cannot be reentrant
across threads in the same process so I am wondering if I should pull together my own patch
from the ticket?

From: David Kesler
Sent: ‎Tuesday‎, ‎March‎ ‎31‎, ‎2015 ‎10‎:‎17‎ ‎AM

How are you constructing your ChildReaper?  It sounds like you’re constructing a child reaper
using the path of the lock itself and the path the childreaper is watching is getting deleted
(though I’m not sure why).  If you’re planning on having a number of locks of the form
/lock/calendar/uuid1, /lock/calendar/uuid2, etc., you should be creating a single ChildReaper
at startup that uses /lock/calendar as the path for your child reaper.  This will ensure that
the children of /lock/calendar (that is, your uuid locks) will get reaped.  You don’t need
to be adding /lock/calendar/uuid to your child reaper directly.


As a side note,  if you’re using InterProcessSemaphoreMutex, there’s currently an issue
with ChildReaper in 2.7 ( which should hopefully
be fixed in the next release.  If you can, you may want to consider InterProcessMutex instead.


From: []
Sent: Tuesday, March 31, 2015 12:57 PM
Subject: Reaper/ChildReaper usage


Hi, I’m using the InterProcessSemaphoreMutex for a distributed locking recipe.


A typical path for a lock might be




I’d assume these paths need to be cleaned up eventually, so I’ve tried using childreaper
and reaper to do so after I unlock the lock.


ChildReaper kind of works. If I add /lock/calendar/uuid it happily removes the children. the
log shows it removes the leases and locks and the node itself is shown to be gone in zkClient
However suddenly it begins complaining in a seemingly endless loop that the path is gone.
This despite trying Mode.Delete and Mode.Until Gone.


Reaper does nothing, probably because /lock/calendar/uuid has children.


Am I missing something? Do I not need to clean up these locks? What do I need to worry concurrency
wise about.
View raw message