hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned
Date Tue, 21 Apr 2015 04:52:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504323#comment-14504323
] 

Vinayakumar B commented on HDFS-7993:
-------------------------------------

Hi andreina,
Patch looks almost good with the updated test.
One update required as mentioned in prev comment.

bq. Could the test fail if the node becomes decommissioned right after it checks isDecommissionInProgress?
Otherwise, it looks good.
yes, you are right [~mingma], Since there are 2 DNs already available, by the time fsck executed
and seen, decommissioned DN might be moved to DECOMMISSIONED soon.
To slow it down, I recommend to start only one node at the beginning of cluster. And once
the DECOMMISSIONING state is verified in fsck, start another datanode and verify for the DECOMMISSIONED.

Few more nits to be fixed in test.
1. Unnecessary assertion {{+    assertNotNull("Failed Cluster Creation", cluster);}}, as if
building fails, then it will throw exception directly.
2. For the current usage of DFSTestUtil, need not build it using Builder. directly can use
static methods.
{code}+    DFSTestUtil util =
+        new DFSTestUtil.Builder().setName(getClass().getSimpleName()).setNumFiles(1).build();
{code}
3. {{+      int count = 0;}} is not used. Either this could should be used in while loop as
a condition. Also I recommend adding @Timeout annotation to test.

> Incorrect descriptions in fsck when nodes are decommissioned
> ------------------------------------------------------------
>
>                 Key: HDFS-7993
>                 URL: https://issues.apache.org/jira/browse/HDFS-7993
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Ming Ma
>            Assignee: J.Andreina
>         Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, HDFS-7993.4.patch,
HDFS-7993.5.patch
>
>
> When you run fsck with "-files" or "-racks", you will get something like below if one
of the replicas is decommissioned.
> {noformat}
> blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
> {noformat}
> That is because in NamenodeFsck, the repl count comes from live replicas count; while
the actual nodes come from LocatedBlock which include decommissioned nodes.
> Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies
LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned
nodes in the verification; just like how fsck excludes decommissioned nodes when it check
for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message