curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cameron McKenzie <cammcken...@apache.org>
Subject Re: LeaderLatch.checkLeadership() logic
Date Thu, 08 Jun 2017 00:42:59 GMT
hey Jeff,
Your assessment looks pretty spot on to me. I guess it's never come up
before as it's an edge case and for most use cases I imagine that the
consequences are minimal (the order of leadership just changes and it would
take a bit longer for the leadership to occur).

If you'd like to create a PR, preferably with a unit test reproducing the
issue if possible, I'd be happy to review it.
cheers
Cam



On Thu, Jun 8, 2017 at 10:15 AM, Jeff Clites <jclites@mac.com> wrote:

> At the bottom of the LeaderLatch.checkLeadership() method (in Curator
> 2.12.0), the following is called to set a watch on the waiter one spot
> ahead of us, as expected:
>
>  String watchPath = sortedChildren.get(ourIndex - 1);
>  ...
>  client.getData().usingWatcher(watcher).inBackground(
> callback).forPath(ZKPaths.makePath(latchPath, watchPath));
>
> When the watch fires and has event type NodeDeleted, then
> LeaderLatch.getChildren() is called to determine if we are now the leader,
> again as expected.
>
> But I don't understand the following. The code for the callback (called in
> response to the getData() completing) has this:
>
>    if ( event.getResultCode() == KeeperException.Code.NONODE.intValue() )
>    {
>        // previous node is gone - reset
>        reset();
>    }
>
> It seems that this should be a call to getChildren(), rather than to
> reset(), since this means that the waiter ahead of us has disappeared (the
> same situation we were watch-ing for, but it happened before we could even
> set a watch), whereas reset() would be the right reaction if our path had
> disappeared. It looks to me like this code "thinks" it's a watch about our
> node, rather than the one-ahead-of-use node. I think this would cause us to
> delete and re-create our node, effectively moving us to the back of the
> wait queue, rather than making us the leader now.
>
> Is this a bug, or am I misinterpreting something?
>
> Looking at the history of this code, it looks like it did in fact used to
> check for leadership at this point, and got changed when the code was
> updated to use background calls (in commit sha ff4ec29f).
>
> Thanks in advance,
>
> Jeff Clites
>

Mime
View raw message