hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3087) Decomissioning on NN restart can complete without blocks being replicated
Date Tue, 13 Mar 2012 21:26:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228728#comment-13228728

Kihwal Lee commented on HDFS-3087:

Datanodemanager.registerDatanode() calls checkDecommissioning(), which checks the data node
against the exclude list. If it's in the list it calls startDecommision(). At this point you
see messages such as "Start Decommissioning node xxx with 0 blocks."

On data node side, after calling registerDatanode(), there is delay until sending the initial
block report.

There are multiple ways to fix this behavior, but I would appreciate community inputs as the
change in behavior can break other use cases.
> Decomissioning on NN restart can complete without blocks being replicated
> -------------------------------------------------------------------------
>                 Key: HDFS-3087
>                 URL: https://issues.apache.org/jira/browse/HDFS-3087
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: Kihwal Lee
>            Priority: Critical
>             Fix For: 0.23.0, 0.24.0, 0.23.2, 0.23.3
> If a data node is added to the exclude list and the name node is restarted, the decomissioning
happens right away on the data node registration. At this point the initial block report has
not been sent, so the name node thinks the node has zero blocks and the decomissioning completes
very quick, without replicating the blocks on that node.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message