hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Govindassamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10960) TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWritten fails at disk error verification after volume remove
Date Tue, 04 Oct 2016 21:03:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546654#comment-15546654
] 

Manoj Govindassamy commented on HDFS-10960:
-------------------------------------------


Looking at the code, remove volumes at DataNode can potentially interrupt BlockReceiver and
if the BlockReceiver happens to be in some IO operations like flushing or setting channel
position for the new checksum then it can throw IOException. {{BlockReceiver}} on getting
IOexception, starts a thread to check for disk errors. 

TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWritten verification fails if the DataNode
ever started a disk error check thread. This verification doesn't seem to be fruitful as we
already have another verification for checking the block replication factor. So, the proposal
here is to replace this not so useful verification with another verification to check for
if the disk removal happened successfully and if the replication factor of the block caught
up even after the volume removal.

> TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWritten fails at disk error verification
after volume remove
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10960
>                 URL: https://issues.apache.org/jira/browse/HDFS-10960
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>            Priority: Minor
>
> TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWritten fails occasionally in the following
verification.
> {code}
>   700     // If an IOException thrown from BlockReceiver#run, it triggers
>   701     // DataNode#checkDiskError(). So we can test whether checkDiskError() is called,
>   702     // to see whether there is IOException in BlockReceiver#run().
>   703     assertEquals(lastTimeDiskErrorCheck, dn.getLastDiskErrorCheck());
>   704 
> {code}
> {noformat}
> Error Message
> expected:<0> but was:<6498109>
> Stacktrace
> java.lang.AssertionError: expected:<0> but was:<6498109>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.junit.Assert.assertEquals(Assert.java:542)
> 	at org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWrittenForDatanode(TestDataNodeHotSwapVolumes.java:703)
> 	at org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten(TestDataNodeHotSwapVolumes.java:620)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message