Return-Path: X-Original-To: apmail-curator-dev-archive@minotaur.apache.org Delivered-To: apmail-curator-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 589D118680 for ; Wed, 23 Sep 2015 13:22:14 +0000 (UTC) Received: (qmail 507 invoked by uid 500); 23 Sep 2015 13:22:04 -0000 Delivered-To: apmail-curator-dev-archive@curator.apache.org Received: (qmail 454 invoked by uid 500); 23 Sep 2015 13:22:04 -0000 Mailing-List: contact dev-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.apache.org Delivered-To: mailing list dev@curator.apache.org Received: (qmail 376 invoked by uid 99); 23 Sep 2015 13:22:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Sep 2015 13:22:04 +0000 Date: Wed, 23 Sep 2015 13:22:04 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: dev@curator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CURATOR-264) Leader election: Duplicate ephemeral nodes with same owner id MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CURATOR-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904496#comment-14904496 ] ASF GitHub Bot commented on CURATOR-264: ---------------------------------------- Github user asfgit closed the pull request at: https://github.com/apache/curator/pull/106 > Leader election: Duplicate ephemeral nodes with same owner id > ------------------------------------------------------------- > > Key: CURATOR-264 > URL: https://issues.apache.org/jira/browse/CURATOR-264 > Project: Apache Curator > Issue Type: Bug > Components: Framework, Recipes > Affects Versions: 2.8.0 > Reporter: Ole Hjalmar Herje > Assignee: Jordan Zimmerman > Priority: Blocker > Fix For: 2.9.1 > > Attachments: testLog.txt, zkNodes.txt, zkTransactionLog.txt > > > We sometimes experience failure in our leader-election functionality when we have network issues. When this situation occurs we see that there are two ephemeral nodes in the zookeeper cluster for the same session but there is no active leader. > I have managed to recreate the same scenario by running a test locally and use iptables to simulate network issues. The debug log (see attachment) shows that findAndDeleteProtectedNodeInBackground does not delete the node because processResult in FindProtectedNodeCB receives a -101 (NoNode) resultcode. I suspect this can happen if the read is not synched? (http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees) > This also seems to be related to: > https://issues.apache.org/jira/browse/CURATOR-45 and > https://issues.apache.org/jira/browse/CURATOR-79 -- This message was sent by Atlassian JIRA (v6.3.4#6332)