hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-6626) Node is marked decommissioned if it becomes dead when it is being decommissioned
Date Fri, 04 Jul 2014 02:27:34 GMT
Ming Ma created HDFS-6626:
-----------------------------

             Summary: Node is marked decommissioned if it becomes dead when it is being decommissioned
                 Key: HDFS-6626
                 URL: https://issues.apache.org/jira/browse/HDFS-6626
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Ming Ma


Not sure if it is by design. But it isn't intuitive. The scenario is like this, you try to
decommission a node; when the node is being decommissioned, the node becomes dead from NN's
point of view; right after that NN will mark this node decommissioned. On the webUI, administrators
will consider the decommission has completed successfully. That is because when there is no
block left for the DN, decommission is considered done.

{noformat}
BlockManager.java
  boolean isReplicationInProgress(DatanodeDescriptor srcNode) {
    boolean status = false;
...
    final Iterator<? extends Block> it = srcNode.getBlockIterator();
    while(it.hasNext()) {
...
// set status if there is block under replication
    }
...
    return status;
}
{noformat}

The question is whether we should mark the dead node as decommission completed (the current
behavior), or mark the dead node "decommission aborted". From administrators' point of view,
when they are doing decomm, they want to know the status of decomm and the health of those
decomm-in-progress nodes. If they can detect decommission failure earlier, they might be able
to take actions earlier; for example if the TOR switch has issues during decomm, administrators
will be able to quickly find out a bunch of "decommission aborted" nodes from the same rack.
People can still find this information by doing the join between decomm node list and recent
dead node list on the webUI; just not as convenient.

Suggestions?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message