Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 83A9ADC6F for ; Thu, 13 Sep 2012 23:02:08 +0000 (UTC) Received: (qmail 75440 invoked by uid 500); 13 Sep 2012 23:02:08 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 75403 invoked by uid 500); 13 Sep 2012 23:02:08 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 75394 invoked by uid 99); 13 Sep 2012 23:02:08 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2012 23:02:08 +0000 Date: Fri, 14 Sep 2012 10:02:08 +1100 (NCT) From: "Todd Lipcon (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1958378176.77814.1347577328245.JavaMail.jiratomcat@arcas> In-Reply-To: <708488813.43277.1346914089785.JavaMail.jiratomcat@arcas> Subject: [jira] [Resolved] (HDFS-3894) QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3894. ------------------------------- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to branch, thx for review > QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching > -------------------------------------------------------------------------- > > Key: HDFS-3894 > URL: https://issues.apache.org/jira/browse/HDFS-3894 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test > Affects Versions: QuorumJournalManager (HDFS-3077) > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Fix For: QuorumJournalManager (HDFS-3077) > > Attachments: hdfs-3894.txt > > > TestQJMWithFaults.testRecoverAfterDoubleFailures fails really occasionally. Looking into it, the issue seems to be that it's possible by random chance for an IPC server port to be reused between two different iterations of the test loop. The client will then pick up and re-use the existing IPC connection to the old server. However, the old server was shut down and restarted, so the old IPC connection is stale (ie disconnected). This causes the new client to get an EOF when it sends the "format()" call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira