curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ole Hjalmar Herje (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CURATOR-264) Leader election: Duplicate ephemeral nodes with same owner id
Date Wed, 23 Sep 2015 08:59:04 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904199#comment-14904199
] 

Ole Hjalmar Herje edited comment on CURATOR-264 at 9/23/15 8:58 AM:
--------------------------------------------------------------------

I have tested with the new version and I can confirm that it was an issue with the namespace.
Curator 2.8.0 tries to get children for path:
#Thread#Thread[Curator-LeaderSelector-0,5,main]this:org.apache.curator.framework.imps.GetChildrenBuilderImpl@1f41e94forPath
path/skatteinfo/iris/skatteinfo/iris/test/leader
and this results in NoNode:
#Thread#Thread[main-EventThread,5,main]this:org.apache.curator.framework.imps.CreateBuilderImpl$FindProtectedNodeCB@43f79b54processResult
resultcode-101
Branch CURATOR-264:
#Thread#Thread[Curator-LeaderSelector-1,5,main]this:org.apache.curator.framework.imps.GetChildrenBuilderImpl@6899dc91forPath
path/skatteinfo/iris/test/leader
#Thread#Thread[main-EventThread,5,main]this:org.apache.curator.framework.imps.CreateBuilderImpl$FindProtectedNodeCB@6d76a000processResult
resultcode0
#Thread#Thread[main-EventThread,5,main]this:org.apache.curator.framework.imps.CreateBuilderImpl$FindProtectedNodeCB@6d76a000processResult
node/test/leader/_c_8f45af77-f782-4c13-a669-6e9a0e011db4-lock-0000000864
#Thread#Thread[main-EventThread,5,main]this:org.apache.curator.framework.imps.DeleteBuilderImpl@9bc3e3cDeleteBuilderImpl
forPath/skatteinfo/iris/test/leader/_c_8f45af77-f782-4c13-a669-6e9a0e011db4-lock-0000000864

and everything seems to be OK. 




was (Author: ollis):
I have tested with the new version and I can confirm that it was a issue with the namespace.
Curator 2.8.0 tries to get children for path:
#Thread#Thread[Curator-LeaderSelector-0,5,main]this:org.apache.curator.framework.imps.GetChildrenBuilderImpl@1f41e94forPath
path/skatteinfo/iris/skatteinfo/iris/test/leader
and this results in NoNode:
#Thread#Thread[main-EventThread,5,main]this:org.apache.curator.framework.imps.CreateBuilderImpl$FindProtectedNodeCB@43f79b54processResult
resultcode-101
Branch CURATOR-264:
#Thread#Thread[Curator-LeaderSelector-1,5,main]this:org.apache.curator.framework.imps.GetChildrenBuilderImpl@6899dc91forPath
path/skatteinfo/iris/test/leader
#Thread#Thread[main-EventThread,5,main]this:org.apache.curator.framework.imps.CreateBuilderImpl$FindProtectedNodeCB@6d76a000processResult
resultcode0
#Thread#Thread[main-EventThread,5,main]this:org.apache.curator.framework.imps.CreateBuilderImpl$FindProtectedNodeCB@6d76a000processResult
node/test/leader/_c_8f45af77-f782-4c13-a669-6e9a0e011db4-lock-0000000864
#Thread#Thread[main-EventThread,5,main]this:org.apache.curator.framework.imps.DeleteBuilderImpl@9bc3e3cDeleteBuilderImpl
forPath/skatteinfo/iris/test/leader/_c_8f45af77-f782-4c13-a669-6e9a0e011db4-lock-0000000864

and everything seems to be OK. 



> Leader election: Duplicate ephemeral nodes with same owner id
> -------------------------------------------------------------
>
>                 Key: CURATOR-264
>                 URL: https://issues.apache.org/jira/browse/CURATOR-264
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework, Recipes
>    Affects Versions: 2.8.0
>            Reporter: Ole Hjalmar Herje
>            Assignee: Jordan Zimmerman
>            Priority: Blocker
>             Fix For: 2.9.1
>
>         Attachments: testLog.txt, zkNodes.txt, zkTransactionLog.txt
>
>
> We sometimes experience failure in our leader-election functionality when we have network
issues. When this situation occurs we see that there are two ephemeral nodes in the zookeeper
cluster for the same session but there is no active leader. 
> I have managed to recreate the same scenario by running a test locally and use iptables
to simulate network issues. The debug log (see attachment) shows that findAndDeleteProtectedNodeInBackground
does not delete the node because processResult in FindProtectedNodeCB receives a -101 (NoNode)
resultcode. I suspect this can happen if the read is not synched? (http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees)
> This also seems to be related to: 
> https://issues.apache.org/jira/browse/CURATOR-45 and
> https://issues.apache.org/jira/browse/CURATOR-79 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message