hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned
Date Mon, 13 Apr 2015 15:11:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492489#comment-14492489
] 

Ming Ma commented on HDFS-7993:
-------------------------------

HDFS-7933 has improved the replica reporting in the case of missing or under replicated block
w.r.t. decommission. It appears we can use that work to address the reporting of fully replicated
blocks.

* Change from {{report.append(" repl=" + liveReplicas);}} to {{report.append(" repl=" + totalReplicas);}}
* Instead of using {{DatanodeInfo}} to find replica details, we can use {{NumberReplicas}}
instead. However, there are two types of "stale" definitions in NN. One is "stale datanode"
when the datanode hasn't sent heartbeat for some time. Another one is "stale block content"
when NN hasn't received block report from that DN after failover; that is what {{NumberReplicas#replicasOnStaleNodes}}
is for. If we need to count "stale datanode", we can add another field to {{NumberReplicas}}
for that.

> Incorrect descriptions in fsck when nodes are decommissioned
> ------------------------------------------------------------
>
>                 Key: HDFS-7993
>                 URL: https://issues.apache.org/jira/browse/HDFS-7993
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Ming Ma
>            Assignee: J.Andreina
>         Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch
>
>
> When you run fsck with "-files" or "-racks", you will get something like below if one
of the replicas is decommissioned.
> {noformat}
> blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
> {noformat}
> That is because in NamenodeFsck, the repl count comes from live replicas count; while
the actual nodes come from LocatedBlock which include decommissioned nodes.
> Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement verifies
LocatedBlock that includes decommissioned nodes. However, it seems better to exclude the decommissioned
nodes in the verification; just like how fsck excludes decommissioned nodes when it check
for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message