hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-8056) Decommissioned dead nodes should continue to be counted as dead after NN restart
Date Fri, 03 Apr 2015 00:20:53 GMT
Ming Ma created HDFS-8056:

             Summary: Decommissioned dead nodes should continue to be counted as dead after
NN restart
                 Key: HDFS-8056
                 URL: https://issues.apache.org/jira/browse/HDFS-8056
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Ming Ma

We had some offline discussion with [~andrew.wang] and [~cmccabe] about this. Bring this up
for more input and get the patch in place.

Dead nodes are tracked by {{DatanodeManager}}'s {{datanodeMap}}. However, after NN restarts,
those nodes that were dead before NN restart won't be in {{datanodeMap}}. {{DatanodeManager}}'s
{{getDatanodeListForReport}} will add those dead nodes, but not if they are in the exclude

    if (listDeadNodes) {
      for (InetSocketAddress addr : includedNodes) {
        if (foundNodes.matchedBy(addr) || excludedNodes.match(addr)) {
        // The remaining nodes are ones that are referenced by the hosts
        // files but that we do not know about, ie that we have never
        // head from. Eg. an entry that is no longer part of the cluster
        // or a bogus entry was given in the hosts files
        // If the host file entry specified the xferPort, we use that.
        // Otherwise, we guess that it is the default xfer port.
        // We can't ask the DataNode what it had configured, because it's
        // dead.
        DatanodeDescriptor dn = new DatanodeDescriptor(new DatanodeID(addr
                .getAddress().getHostAddress(), addr.getHostName(), "",
                addr.getPort() == 0 ? defaultXferPort : addr.getPort(),
                defaultInfoPort, defaultInfoSecurePort, defaultIpcPort));

The issue here is the decommissioned dead node JMX will be different after NN restart. It
might be better to make it consistent across NN restart. 

This message was sent by Atlassian JIRA

View raw message