hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-11044) TestRollingUpgrade fails intermittently
Date Sat, 22 Oct 2016 03:18:58 GMT

     [ https://issues.apache.org/jira/browse/HDFS-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yiqun Lin updated HDFS-11044:
-----------------------------
    Description: 
The test {{TestRollingUpgrade#testRollback}} fails intermittently in trunk(https://builds.apache.org/job/PreCommit-HDFS-Build/17250/testReport/).
The stack info:
{code}
java.lang.AssertionError: Test resulted in an unexpected exit
	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1949)
	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936)
	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929)
	at org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(TestRollingUpgrade.java:351)
{code}
I looked into that, it seems  there is some IOException happenning in writing files to nn
storages(Can see jenkins report). And then this exception will be remenbered in {{ExitUtil.firstExitException}}.
Finally when we do the cluster's shutdown operations, this exception will be threw.

The exception info:
{code}
2016-10-21 12:54:02,300 [main] FATAL hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1946))
- Test resulted in an unexpected exit
org.apache.hadoop.util.ExitUtil$ExitException: java.io.IOException: All the storage failed
while writing properties to VERSION file
	at org.apache.hadoop.hdfs.server.namenode.NNStorage.writeAll(NNStorage.java:1151)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.updateStorageVersion(FSImage.java:999)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:850)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:240)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:149)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:819)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
{code}

The IOException is beacause that all the sotrage dir have be removed. IMO, one of the reason
is that when we  writing some properties or write transactionId to storage failed that lead
the existing sotrage to be removed.

In test {{TestRollingUpgrade#testRollback}} it will do many times for restarting namenode
operations, the underlying IO exceptions will be happened. So I'm not sure if it's normal
here. But one way that I am sure to fix this: We can use {{checkExitOnShutdown(false)}} to
skip the ExitException check. And this have been done in {{TestRollingUpgrade#testRollingUpgradeWithQJM}}.



  was:
The test {{TestRollingUpgrade#testRollback}} fails intermittently in trunk(https://builds.apache.org/job/PreCommit-HDFS-Build/17250/testReport/).
The stack info:
{code}
java.lang.AssertionError: Test resulted in an unexpected exit
	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1949)
	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936)
	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929)
	at org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(TestRollingUpgrade.java:351)
{code}
I looked into that, it seems  there is some IOException happenning in writing files to nn
storages(Can see jenkins report). And then this exception will be remenbered in {{ExitUtil.firstExitException}}.
Finally when we do the cluster's shutdown operations, this exception will be threw.

The exception info:
{code}
2016-10-21 12:54:02,300 [main] FATAL hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1946))
- Test resulted in an unexpected exit
org.apache.hadoop.util.ExitUtil$ExitException: java.io.IOException: All the storage failed
while writing properties to VERSION file
	at org.apache.hadoop.hdfs.server.namenode.NNStorage.writeAll(NNStorage.java:1151)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.updateStorageVersion(FSImage.java:999)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:850)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:240)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:149)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:819)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
{code}

The IOException is beacause that all the sotrage dir have be removed. IMO, one of the reason
is that when we  writing some properties or write transactionId to storage failed that lead
the existing sotrage to be removed.

In test {{TestRollingUpgrade#testRollback}} it will do many times for restarting namenode
operations, the underlying IO exceptions will be happened. So I'm not sure if it's normal
here. But one way the I am sure to fix this: We can use {{checkExitOnShutdown(false)}} to
skip the ExitException check. And this have been done in {{TestRollingUpgrade#testRollingUpgradeWithQJM}}.




> TestRollingUpgrade fails intermittently
> ---------------------------------------
>
>                 Key: HDFS-11044
>                 URL: https://issues.apache.org/jira/browse/HDFS-11044
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>
> The test {{TestRollingUpgrade#testRollback}} fails intermittently in trunk(https://builds.apache.org/job/PreCommit-HDFS-Build/17250/testReport/).
The stack info:
> {code}
> java.lang.AssertionError: Test resulted in an unexpected exit
> 	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1949)
> 	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936)
> 	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929)
> 	at org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(TestRollingUpgrade.java:351)
> {code}
> I looked into that, it seems  there is some IOException happenning in writing files to
nn storages(Can see jenkins report). And then this exception will be remenbered in {{ExitUtil.firstExitException}}.
Finally when we do the cluster's shutdown operations, this exception will be threw.
> The exception info:
> {code}
> 2016-10-21 12:54:02,300 [main] FATAL hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1946))
- Test resulted in an unexpected exit
> org.apache.hadoop.util.ExitUtil$ExitException: java.io.IOException: All the storage failed
while writing properties to VERSION file
> 	at org.apache.hadoop.hdfs.server.namenode.NNStorage.writeAll(NNStorage.java:1151)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.updateStorageVersion(FSImage.java:999)
> 	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:850)
> 	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:240)
> 	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:149)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:819)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> {code}
> The IOException is beacause that all the sotrage dir have be removed. IMO, one of the
reason is that when we  writing some properties or write transactionId to storage failed that
lead the existing sotrage to be removed.
> In test {{TestRollingUpgrade#testRollback}} it will do many times for restarting namenode
operations, the underlying IO exceptions will be happened. So I'm not sure if it's normal
here. But one way that I am sure to fix this: We can use {{checkExitOnShutdown(false)}} to
skip the ExitException check. And this have been done in {{TestRollingUpgrade#testRollingUpgradeWithQJM}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message