Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0195B200AE3 for ; Thu, 5 May 2016 05:18:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id F3FAC1609FF; Thu, 5 May 2016 03:18:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4A0871609FC for ; Thu, 5 May 2016 05:18:14 +0200 (CEST) Received: (qmail 13403 invoked by uid 500); 5 May 2016 03:18:13 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 13385 invoked by uid 99); 5 May 2016 03:18:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2016 03:18:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id EFE092C1F5C for ; Thu, 5 May 2016 03:18:12 +0000 (UTC) Date: Thu, 5 May 2016 03:18:12 +0000 (UTC) From: "Masatake Iwasaki (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-2043) TestHFlush failing intermittently MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 05 May 2016 03:18:15 -0000 [ https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271828#comment-15271828 ] Masatake Iwasaki commented on HDFS-2043: ---------------------------------------- In both case of IOException and ClosedByInterruptException, I can see the message "Got expected exception during close" in the test logs. The exception was thrown on the second {{stm.close()}} in the catch block below. {code} try { stm.close(); // If we made it past the close(), then that means that the ack made it back // from the pipeline before we got to the wait() call. In that case we should // still have interrupted status. assertTrue(Thread.interrupted()); } catch (InterruptedIOException ioe) { System.out.println("Got expected exception during close"); // If we got the exception, we shouldn't have interrupted status anymore. assertFalse(Thread.currentThread().isInterrupted()); // Now do a successful close. stm.close(); } {code} The catched ioe points to {{DFSOutputStream#closeImpl}}. (The stack trace is logged by fixing TestHFlush in my local environment.) {noformat} java.io.InterruptedIOException: Interrupted while waiting for data to be acknowledged by pipeline at org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:771) at org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:697) at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:755) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101) at org.apache.hadoop.hdfs.TestHFlush.testHFlushInterrupted(TestHFlush.java:480) {noformat} The testHFlushInterrupted expects that the second {{stm.close()}} succeeds but it is not true. Underlying streamer thread is closed since {{closeThreads(true)}} is called in the finally block of {{DFSOutputStream#closeImpl}}. {code} } finally { // Failures may happen when flushing data. // Streamers may keep waiting for the new block information. // Thus need to force closing these threads. // Don't need to call setClosed() because closeThreads(true) // calls setClosed() in the finally block. closeThreads(true); } {code} I think we should just catch IOException on the second {{stm.close()}} and ignore it. The final check in the test should fail if there is a problem. {code} // verify that entire file is good AppendTestUtil.checkFullFile(fs, p, 4, fileContents, "Failed to deal with thread interruptions", false); {code} > TestHFlush failing intermittently > --------------------------------- > > Key: HDFS-2043 > URL: https://issues.apache.org/jira/browse/HDFS-2043 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Aaron T. Myers > Assignee: Lin Yiqun > Attachments: HDFS-2043.002.patch, HDFS-2043.003.patch, HDFS.001.patch > > > I can't reproduce this failure reliably, but it seems like TestHFlush has been failing intermittently, with the frequency increasing of late. > Note the following two pre-commit test runs from different JIRAs where TestHFlush seems to have failed spuriously: > https://builds.apache.org/job/PreCommit-HDFS-Build/734//testReport/ > https://builds.apache.org/job/PreCommit-HDFS-Build/680//testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org