Return-Path: X-Original-To: apmail-curator-dev-archive@minotaur.apache.org Delivered-To: apmail-curator-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 182D810B6E for ; Sun, 23 Feb 2014 10:11:47 +0000 (UTC) Received: (qmail 99994 invoked by uid 500); 23 Feb 2014 10:11:45 -0000 Delivered-To: apmail-curator-dev-archive@curator.apache.org Received: (qmail 99960 invoked by uid 500); 23 Feb 2014 10:11:45 -0000 Mailing-List: contact dev-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.apache.org Delivered-To: mailing list dev@curator.apache.org Received: (qmail 99952 invoked by uid 99); 23 Feb 2014 10:11:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Feb 2014 10:11:44 +0000 X-ASF-Spam-Status: No, hits=-2000.6 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 23 Feb 2014 10:11:42 +0000 Received: (qmail 99930 invoked by uid 99); 23 Feb 2014 10:11:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Feb 2014 10:11:20 +0000 Date: Sun, 23 Feb 2014 10:11:20 +0000 (UTC) From: "Jordan Zimmerman (JIRA)" To: dev@curator.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CURATOR-3) LeaderLatch race condition causing extra nodes to be added in Zookeeper Edit MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CURATOR-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909734#comment-13909734 ] Jordan Zimmerman commented on CURATOR-3: ---------------------------------------- Is this related to CURATOR-73? Please test with the most recent push. > LeaderLatch race condition causing extra nodes to be added in Zookeeper Edit > ---------------------------------------------------------------------------- > > Key: CURATOR-3 > URL: https://issues.apache.org/jira/browse/CURATOR-3 > Project: Apache Curator > Issue Type: Bug > Components: Recipes > Affects Versions: 2.0.0-incubating > Reporter: Jordan Zimmerman > Fix For: TBD > > > From https://github.com/Netflix/curator/issues/265 > Looks like there's a race condition in LeaderLatch. If LeaderLatch.close() is called at the right time while the latch's watch handler is running, the latch will place another node in Zookeeper after the latch is closed. > Basically how it happens is this: > 1) I have two processes contesting a LeaderLatch, ProcessA and ProcessB. ProcessA is leader. > 2) ProcessA loses leadership somehow (it releases, its connection goes down, etc.) > 3) This causes ProcessB's watch to get called, check the state is still STARTED, and if so the LeaderLatch will re-evaluate if it is leader. > 4) While the watch handler is running, close() is called on the LeaderLatch on ProcessB. This sets the LeaderLatch state to CLOSED, removes the znode from ZK and closes off the LeaderLatch. > 5) The watch handler has already checked that the state is STARTED, so it does a getChildren() on the latch path, and finds the latch's znode is missing. It goes ahead and calls reset(), which places a new znode in Zookeeper. > Result: The LeaderLatch is closed, but there is still a node in Zookeeper that isn't associated with any LeaderLatch and won't go away until the session goes down. Subsequent LeaderLatches at this path can never get leadership while that session is up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)