hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8056) Decommissioned dead nodes should continue to be counted as dead after NN restart
Date Mon, 16 Nov 2015 21:06:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007317#comment-15007317
] 

Andrew Wang commented on HDFS-8056:
-----------------------------------

Thanks for working on this Ming, +1 LGTM. Seems inline with our earlier work on decommissioning
dead DNs at HDFS-7725 and HDFS-7374.

The patch still applied cleanly, but I started another precommit job since this patch has
been sitting for a while. Let's commit when that comes back.

> Decommissioned dead nodes should continue to be counted as dead after NN restart
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-8056
>                 URL: https://issues.apache.org/jira/browse/HDFS-8056
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>              Labels: BB2015-05-RFC
>         Attachments: HDFS-8056-2.patch, HDFS-8056.patch
>
>
> We had some offline discussion with [~andrew.wang] and [~cmccabe] about this. Bring this
up for more input and get the patch in place.
> Dead nodes are tracked by {{DatanodeManager}}'s {{datanodeMap}}. However, after NN restarts,
those nodes that were dead before NN restart won't be in {{datanodeMap}}. {{DatanodeManager}}'s
{{getDatanodeListForReport}} will add those dead nodes, but not if they are in the exclude
file.
> {noformat}
>     if (listDeadNodes) {
>       for (InetSocketAddress addr : includedNodes) {
>         if (foundNodes.matchedBy(addr) || excludedNodes.match(addr)) {
>           continue;
>         }
>         // The remaining nodes are ones that are referenced by the hosts
>         // files but that we do not know about, ie that we have never
>         // head from. Eg. an entry that is no longer part of the cluster
>         // or a bogus entry was given in the hosts files
>         //
>         // If the host file entry specified the xferPort, we use that.
>         // Otherwise, we guess that it is the default xfer port.
>         // We can't ask the DataNode what it had configured, because it's
>         // dead.
>         DatanodeDescriptor dn = new DatanodeDescriptor(new DatanodeID(addr
>                 .getAddress().getHostAddress(), addr.getHostName(), "",
>                 addr.getPort() == 0 ? defaultXferPort : addr.getPort(),
>                 defaultInfoPort, defaultInfoSecurePort, defaultIpcPort));
>         setDatanodeDead(dn);
>         nodes.add(dn);
>       }
>     }
> {noformat}
> The issue here is the decommissioned dead node JMX will be different after NN restart.
It might be better to make it consistent across NN restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message