zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Abandoned exists() watches are forever
Date Fri, 02 Dec 2011 21:58:11 GMT
Good catch. I don't think that we can do anything automatically to
resolve this. However there is a jira pending which would allow you to
at least remove the watch when this does occur:
https://issues.apache.org/jira/browse/ZOOKEEPER-442

Patrick

On Fri, Dec 2, 2011 at 1:36 PM, Robert Crocombe <rcrocomb@gmail.com> wrote:
> Suppose you set an exists() watch on a node, e.g. in Groovy:
>
> def latch = new CountDownLatch(1)
> def stat = zooKeeper.exists("$lockParentNode/$toWatch", [process: { event
> -> latch.countDown(); log.debug("fired latch on event $event") }, toString:
> {""}] as Watcher)
> if (stat != null) {
>    // Okay, we've set watch: wait for an event and try again
>    log.debug("Set watch on less than me '$toWatch': blocking until an
> event occurs which may let us acquire")
>    latch.await()
> } else {
>    // Dang!  Person immediately less than us is gone, try again
>    // This is moderately weird unless they were the only ones
>    // less than us and so might have owned the lock and just
>    // released it
>    log.debug("Node '$toWatch' gone when setting watch: trying again to
> acquire")
> }
>
> Suppose that exists() does return null.  It appears to be the case that the
> watch is still registered (both from the evidence below plus a cursory
> examination of the ZooKeeper.java client code).  In my
> case "$lockParentNode/$toWatch" is ultimately a sequential ephemeral node
> that will never ever occur again (part of yet another implementation of a
> ZooKeeper lock).  Thus, I believe this watch will remain until the session
> that created it is removed, which for us could be months.  Basically we're
> leaking a Closure and associated CountDownLatch for each time the node to
> be watched is deleted in the interval between when we initially look for it
> and when exists() returns null.  I only noticed it when playing with "wchc"
> as part of trying to understand a lost watch.
>
> 0x233a3c1db310006
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004876
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004234
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004684
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004588
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003118
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003772
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005206
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000001876
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004924
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002020
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005170
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000006526
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002260
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002920
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004414
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005848
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005278
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005752
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005380
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004360
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004624
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002728
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000001846
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004264
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000006142
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004660
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005956
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000004810
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002428
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003274
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003370
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000002398
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003712
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000003652
>        /plexus/slaves/grid279/lock/x-233a3c1db310003-0000005314
>
> Does this seem like a correct understanding to those with a deeper
> understanding of ZooKeeper internals, and does it seem like a problem worth
> rectifying?
>
> --
> Robert Crocombe

Mime
View raw message