hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9599) TestDecommissioningStatus.testDecommissionStatus occasionally fails
Date Mon, 28 Mar 2016 17:05:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214475#comment-15214475

Wei-Chiu Chuang commented on HDFS-9599:

Hi [~linyiqun], thanks for the contribution.
I think your patch makes sense to me. However, instead of starting/shutting down the cluster
explicitly in each test method, what about changing the annotation of {{setUp}} from {{@BeforeClass}}
to {{@Before}}, and the annotation of {{tearDown}} from {{@AfterClass}} to {{@After}}?

This will make sure that the cluster is shut down properly even if an exception is thrown
in the test method, while making sure the tests are isolated.

> TestDecommissioningStatus.testDecommissionStatus occasionally fails
> -------------------------------------------------------------------
>                 Key: HDFS-9599
>                 URL: https://issues.apache.org/jira/browse/HDFS-9599
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>         Environment: Jenkins
>            Reporter: Wei-Chiu Chuang
>            Assignee: Lin Yiqun
>         Attachments: HDFS-9599.001.patch
> From test result of a recent jenkins nightly https://builds.apache.org/job/Hadoop-Hdfs-trunk/2663/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestDecommissioningStatus/testDecommissionStatus/
> The test failed because the number of under replicated blocks is 4, instead of 3.
> Looking at the log, there is a strayed block, which might have caused the faillure:
> {noformat}
> 2015-12-23 00:42:05,820 [Block report processor] INFO  BlockStateChange (BlockManager.java:processReport(2131))
- BLOCK* processReport: blk_1073741825_1001 on node size 16384 does not belong
to any file
> {noformat}
> The block size 16384 suggests this is left over from the sibling test case testDecommissionStatusAfterDNRestart.
This can happen, because the same minidfs cluster is reused between tests.
> The test implementation should do a better job isolating tests.
> Another case of failure is when the load factor comes into play, and a block can not
find sufficient data nodes to place replica. In this test, the runtime should not consider
load factor:
> {noformat}
> {noformat}

This message was sent by Atlassian JIRA

View raw message