Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 28B2F200BC9 for ; Sat, 22 Oct 2016 05:40:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 27530160AE8; Sat, 22 Oct 2016 03:40:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6B526160AF2 for ; Sat, 22 Oct 2016 05:40:00 +0200 (CEST) Received: (qmail 37478 invoked by uid 500); 22 Oct 2016 03:39:59 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 37169 invoked by uid 99); 22 Oct 2016 03:39:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Oct 2016 03:39:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D932F2C2AB7 for ; Sat, 22 Oct 2016 03:39:58 +0000 (UTC) Date: Sat, 22 Oct 2016 03:39:58 +0000 (UTC) From: "Yiqun Lin (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-11044) TestRollingUpgrade fails intermittently MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 22 Oct 2016 03:40:01 -0000 [ https://issues.apache.org/jira/browse/HDFS-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-11044: ----------------------------- Attachment: HDFS-11044.001.patch > TestRollingUpgrade fails intermittently > --------------------------------------- > > Key: HDFS-11044 > URL: https://issues.apache.org/jira/browse/HDFS-11044 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Yiqun Lin > Assignee: Yiqun Lin > Attachments: HDFS-11044.001.patch > > > The test {{TestRollingUpgrade#testRollback}} fails intermittently in trunk(https://builds.apache.org/job/PreCommit-HDFS-Build/17250/testReport/). The stack info: > {code} > java.lang.AssertionError: Test resulted in an unexpected exit > at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1949) > at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936) > at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929) > at org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(TestRollingUpgrade.java:351) > {code} > I looked into that, it seems there is some IOException happenning in writing files to nn storages(Can see jenkins report). And then this exception will be remenbered in {{ExitUtil.firstExitException}}. Finally when we do the cluster's shutdown operations, this exception will be threw. > The exception info: > {code} > 2016-10-21 12:54:02,300 [main] FATAL hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1946)) - Test resulted in an unexpected exit > org.apache.hadoop.util.ExitUtil$ExitException: java.io.IOException: All the storage failed while writing properties to VERSION file > at org.apache.hadoop.hdfs.server.namenode.NNStorage.writeAll(NNStorage.java:1151) > at org.apache.hadoop.hdfs.server.namenode.FSImage.updateStorageVersion(FSImage.java:999) > at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:850) > at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:240) > at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:149) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:819) > at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > {code} > The IOException is beacause that all the sotrage dir have be removed. IMO, one of the reason is that when we writing some properties or write transactionId to storage failed that lead the existing sotrage to be removed. > In test {{TestRollingUpgrade#testRollback}} it will do many times for restarting namenode operations, the underlying IO exceptions will be happened. So I'm not sure if it's normal here. But one way that I am sure to fix this: We can use {{checkExitOnShutdown(false)}} to skip the ExitException check. And this have been done in {{TestRollingUpgrade#testRollingUpgradeWithQJM}}. In addition, since that the shutdown operation is the last operation in the test, it will not influence the current logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org