hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lin Yiqun (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-9865) TestBlockReplacement fails intermittently in trunk
Date Fri, 26 Feb 2016 12:07:18 GMT

     [ https://issues.apache.org/jira/browse/HDFS-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lin Yiqun updated HDFS-9865:
----------------------------
    Attachment: HDFS-9865.001.patch

> TestBlockReplacement fails intermittently in trunk
> --------------------------------------------------
>
>                 Key: HDFS-9865
>                 URL: https://issues.apache.org/jira/browse/HDFS-9865
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.7.1
>            Reporter: Lin Yiqun
>            Assignee: Lin Yiqun
>         Attachments: HDFS-9865.001.patch
>
>
> I found the testcase {{TestBlockReplacement}} will be failed sometimes in testing. And
I looked the unit log, always I will found these infos:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement
> testDeletedBlockWhenAddBlockIsInEdit(org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement)
 Time elapsed: 8.764 sec  <<< FAILURE!
> java.lang.AssertionError: The block should be only on 1 datanode  expected:<1>
but was:<2>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testDeletedBlockWhenAddBlockIsInEdit(TestBlockReplacement.java:436)
> {code}
> Finally I found the reason is that not deleting block completely in testDeletedBlockWhenAddBlockIsInEdit
cause the datanode's num not correct. And the time to wait FsDatasetAsyncDsikService to delete
the block is not a accurate value. 
> {code}
> LOG.info("replaceBlock:  " + replaceBlock(block,
>           (DatanodeInfo)sourceDnDesc, (DatanodeInfo)sourceDnDesc,
>           (DatanodeInfo)destDnDesc));
> // Waiting for the FsDatasetAsyncDsikService to delete the block
> Thread.sleep(3000);
> {code}
> When I adjust this time to 1 seconds, it will be always failed. Also the 3 seconds in
test is not a accurate value too. We should adjust these code's logic to a better way such
as waiting for the block to be replicated in testDecommision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message