hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-10434) Fix intermittent test failure of TestDataNodeErasureCodingMetrics
Date Thu, 04 Aug 2016 06:22:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407242#comment-15407242
] 

Rakesh R edited comment on HDFS-10434 at 8/4/16 6:21 AM:
---------------------------------------------------------

After long analysis, finally I think I got the cause of the failure. It is wrongly finding
out the datanode to be corrupted from the block locations. Instead of finding out a datanode
which is used in the block locations it is simply getting a datanode from the cluster, which
may not be a datanode present in the block locations.
{code}
    byte[] indices = lastBlock.getBlockIndices();
    //corrupt the first block
    DataNode toCorruptDn = cluster.getDataNodes().get(indices[0]);
{code}

For example, datanodes in the {{cluster.getDataNodes()}} array indexed like, 0->Dn1, 1->Dn2,
2->Dn3, 3->Dn4,  4->Dn5, 5->Dn6, 6->Dn7, 7->Dn8, 8->Dn9, 9->Dn10

Assume the datanodes which are part of block location is => Dn2, Dn3, Dn4,  Dn5, Dn6, Dn7,
Dn8, Dn9, Dn10. Now, in the failed scenario, it is getting the corrupted datanode as {{cluster.getDataNodes().get(0)}}
which will be Dn1 and corruption of this datanode will not result in ECWork and is failing
the tests. Ideally, the test should find a datanode from the block locations for corruption.

Basically there are two problems in this test case. First one was fixed as part of this jira.
For the second part, I think will raise another jira and fix it as there is no relation between
first and second. Please review the HDFS-10720 fix. Thanks!


was (Author: rakeshr):
I think I got the cause of the failure. It is wrongly finding out the datanode to be corrupted
from the block locations. Instead of finding out a datanode which is used in the block locations
it is simply getting a datanode from the cluster, which may not be a datanode present in the
block locations.
{code}
    byte[] indices = lastBlock.getBlockIndices();
    //corrupt the first block
    DataNode toCorruptDn = cluster.getDataNodes().get(indices[0]);
{code}

For example, datanodes in the {{cluster.getDataNodes()}} array indexed like, 0->Dn1, 1->Dn2,
2->Dn3, 3->Dn4,  4->Dn5, 5->Dn6, 6->Dn7, 7->Dn8, 8->Dn9, 9->Dn10

Assume the datanodes which are part of block location is => Dn2, Dn3, Dn4,  Dn5, Dn6, Dn7,
Dn8, Dn9, Dn10. Now, in the failed scenario, it is getting the corrupted datanode as {{cluster.getDataNodes().get(0)}}
which will be Dn1 and corruption of this datanode will not result in ECWork and is failing
the tests. Ideally, the test should find a datanode from the block locations for corruption.

Basically there are two problems in this test case. First one was fixed as part of this jira.
For the second part, I think will raise another jira and fix it as there is no relation between
first and second. Please review the HDFS-10720 fix. Thanks!

> Fix intermittent test failure of TestDataNodeErasureCodingMetrics
> -----------------------------------------------------------------
>
>                 Key: HDFS-10434
>                 URL: https://issues.apache.org/jira/browse/HDFS-10434
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 3.0.0-alpha1
>
>         Attachments: HDFS-10434-00.patch, HDFS-10434-01.patch
>
>
> This jira is to fix the test case failure.
> Reference : [Build15485_TestDataNodeErasureCodingMetrics_testEcTasks|https://builds.apache.org/job/PreCommit-HDFS-Build/15485/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeErasureCodingMetrics/testEcTasks/]
> {code}
> Error Message
> Bad value for metric EcReconstructionTasks expected:<1> but was:<0>
> Stacktrace
> java.lang.AssertionError: Bad value for metric EcReconstructionTasks expected:<1>
but was:<0>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:228)
> 	at org.apache.hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics.testEcTasks(TestDataNodeErasureCodingMetrics.java:92)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message