curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Germán Blanco (JIRA) <j...@apache.org>
Subject [jira] [Commented] (CURATOR-3) LeaderLatch race condition causing extra nodes to be added in Zookeeper Edit
Date Wed, 18 Sep 2013 08:05:51 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770549#comment-13770549
] 

Germán Blanco commented on CURATOR-3:
-------------------------------------

As far as I see, they are both problems with an unexpected exit condition in the Leader Election
recipes, which are both based on InterProcessMutex. On both cases there is an ephemeral node
that is not being handled by any thread but it is linked to one of the existing sessions.
This ephemeral node blocks the election process. The proposed solution of cleaning up old
ephemeral nodes linked to the session after a reconnection could solved both problems (at
least in my opinion).
                
> LeaderLatch race condition causing extra nodes to be added in Zookeeper Edit
> ----------------------------------------------------------------------------
>
>                 Key: CURATOR-3
>                 URL: https://issues.apache.org/jira/browse/CURATOR-3
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 2.0.0-incubating
>            Reporter: Jordan Zimmerman
>
> From https://github.com/Netflix/curator/issues/265
> Looks like there's a race condition in LeaderLatch. If LeaderLatch.close() is called
at the right time while the latch's watch handler is running, the latch will place another
node in Zookeeper after the latch is closed.
> Basically how it happens is this:
> 1) I have two processes contesting a LeaderLatch, ProcessA and ProcessB. ProcessA is
leader.
> 2) ProcessA loses leadership somehow (it releases, its connection goes down, etc.)
> 3) This causes ProcessB's watch to get called, check the state is still STARTED, and
if so the LeaderLatch will re-evaluate if it is leader.
> 4) While the watch handler is running, close() is called on the LeaderLatch on ProcessB.
This sets the LeaderLatch state to CLOSED, removes the znode from ZK and closes off the LeaderLatch.
> 5) The watch handler has already checked that the state is STARTED, so it does a getChildren()
on the latch path, and finds the latch's znode is missing. It goes ahead and calls reset(),
which places a new znode in Zookeeper.
> Result: The LeaderLatch is closed, but there is still a node in Zookeeper that isn't
associated with any LeaderLatch and won't go away until the session goes down. Subsequent
LeaderLatches at this path can never get leadership while that session is up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message