Return-Path: Delivered-To: apmail-hadoop-zookeeper-dev-archive@minotaur.apache.org Received: (qmail 68404 invoked from network); 3 Apr 2009 16:08:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Apr 2009 16:08:34 -0000 Received: (qmail 67141 invoked by uid 500); 3 Apr 2009 16:08:34 -0000 Delivered-To: apmail-hadoop-zookeeper-dev-archive@hadoop.apache.org Received: (qmail 67087 invoked by uid 500); 3 Apr 2009 16:08:34 -0000 Mailing-List: contact zookeeper-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-dev@hadoop.apache.org Delivered-To: mailing list zookeeper-dev@hadoop.apache.org Received: (qmail 67075 invoked by uid 99); 3 Apr 2009 16:08:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Apr 2009 16:08:34 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Apr 2009 16:08:33 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id F187A234C003 for ; Fri, 3 Apr 2009 09:08:12 -0700 (PDT) Message-ID: <1980322075.1238774892976.JavaMail.jira@brutus> Date: Fri, 3 Apr 2009 09:08:12 -0700 (PDT) From: "Flavio Paiva Junqueira (JIRA)" To: zookeeper-dev@hadoop.apache.org Subject: [jira] Updated: (ZOOKEEPER-362) Issues with FLENewEpochTest In-Reply-To: <1662900445.1238749512934.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/ZOOKEEPER-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Paiva Junqueira updated ZOOKEEPER-362: --------------------------------------------- Status: Open (was: Patch Available) > Issues with FLENewEpochTest > --------------------------- > > Key: ZOOKEEPER-362 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-362 > Project: Zookeeper > Issue Type: Bug > Affects Versions: 3.1.1 > Reporter: Flavio Paiva Junqueira > Fix For: 3.2.0 > > Attachments: ZOOKEEPER-362.patch, ZOOKEEPER-362.patch > > > I have been able to identify two reasons that cause FLENewEpochTest to fail: > 1- There is a race condition that is triggered when two peers try to establish a connection to each other for leader election. Basically, if they start roughly at the same time, the server with highest id will try to open two connections. The two competing connections will lead to one notification message to be lost. This message happens to be critical for this two process scenario; > 2- The code to shut down a peer is not working well with the unit tests. For this particular unit test, we need to be able to shut down a peer completely to check the situation the test tries to reproduce. However, it seems that in some runs timing causes the other peers to believe it is still alive, and end up electing it. This peer, however, eventually shuts down and leader election fails. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.