hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3087) Decomissioning on NN restart can complete without blocks being replicated
Date Thu, 15 Mar 2012 17:51:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230357#comment-13230357
] 

Suresh Srinivas commented on HDFS-3087:
---------------------------------------

Kihwal, this is a good bug find. We should fix this. 

This problem is not that serious. Prior to 0.23, we shutdown the datanode post decommission
completed. After HDFS-1547 we do not shutdown the DN any more. The DN continues to shown as
decommissioned. The expectation is, an Admin can at a later time shutdown the decommissioned
DNs and proceed with maintenance of the node. Given this the question is, after we mark DN
as decommissioned, when block report comes in, what happens? I suspect we moving back to decom
in progress.

How about using the flag that DatanodeDescriptor has for tracking first block report. We should
not mark a DN as decommissioned, if block report is not received. I also agree that we should
not be marking any thing as decommissioned, until we come out of safemode.
                
> Decomissioning on NN restart can complete without blocks being replicated
> -------------------------------------------------------------------------
>
>                 Key: HDFS-3087
>                 URL: https://issues.apache.org/jira/browse/HDFS-3087
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>             Fix For: 0.23.0, 0.24.0, 0.23.2, 0.23.3
>
>
> If a data node is added to the exclude list and the name node is restarted, the decomissioning
happens right away on the data node registration. At this point the initial block report has
not been sent, so the name node thinks the node has zero blocks and the decomissioning completes
very quick, without replicating the blocks on that node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message