curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CURATOR-264) Leader election: Duplicate ephemeral nodes with same owner id
Date Wed, 23 Sep 2015 00:01:04 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903686#comment-14903686
] 

ASF GitHub Bot commented on CURATOR-264:
----------------------------------------

Github user Randgalt commented on the pull request:

    https://github.com/apache/curator/pull/106#issuecomment-142456662
  
    It was being un-namespaced inside of the CreateBuilder. Then, that resolved path was passed
via findAndDeleteProtectedNodeInBackground to the Curator routines which would resolve it
again. So, that's twice, right? Yes.


> Leader election: Duplicate ephemeral nodes with same owner id
> -------------------------------------------------------------
>
>                 Key: CURATOR-264
>                 URL: https://issues.apache.org/jira/browse/CURATOR-264
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework, Recipes
>    Affects Versions: 2.8.0
>            Reporter: Ole Hjalmar Herje
>            Assignee: Jordan Zimmerman
>            Priority: Blocker
>             Fix For: 2.9.1
>
>         Attachments: testLog.txt, zkNodes.txt, zkTransactionLog.txt
>
>
> We sometimes experience failure in our leader-election functionality when we have network
issues. When this situation occurs we see that there are two ephemeral nodes in the zookeeper
cluster for the same session but there is no active leader. 
> I have managed to recreate the same scenario by running a test locally and use iptables
to simulate network issues. The debug log (see attachment) shows that findAndDeleteProtectedNodeInBackground
does not delete the node because processResult in FindProtectedNodeCB receives a -101 (NoNode)
resultcode. I suspect this can happen if the read is not synched? (http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees)
> This also seems to be related to: 
> https://issues.apache.org/jira/browse/CURATOR-45 and
> https://issues.apache.org/jira/browse/CURATOR-79 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message