hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7374) Allow decommissioning of dead DataNodes
Date Fri, 14 Nov 2014 20:15:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212766#comment-14212766
] 

Andrew Wang commented on HDFS-7374:
-----------------------------------

Hey [~mingma], I was looking a bit more at decom, and I see that we have this if statement
at the end of {{isReplicationInProgress}}:

{code}
    if (!status && !srcNode.isAlive) {
      LOG.warn("srcNode " + srcNode + " is dead " +
          "when decommission is in progress. Continue to mark " +
          "it as decommission in progress. In that way, when it rejoins the " +
          "cluster it can continue the decommission process.");
      status = true;
    }
{code}

Logically, a (DEAD, DECOM_IN_PROGRESS) should be able to go to (DEAD, DECOMMED) if all of
its blocks are fully replicated, but this if statement prevents {{isReplicationInProgress}}
from ever returning false for a dead node. It seems like we can loosen this requirement?

> Allow decommissioning of dead DataNodes
> ---------------------------------------
>
>                 Key: HDFS-7374
>                 URL: https://issues.apache.org/jira/browse/HDFS-7374
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-7374-001.patch, HDFS-7374-002.patch
>
>
> We have seen the use case of decommissioning DataNodes that are already dead or unresponsive,
and not expected to rejoin the cluster.
> The logic introduced by HDFS-6791 will mark those nodes as {{DECOMMISSION_INPROGRESS}},
with a hope that they can come back and finish the decommission work. If an upper layer application
is monitoring the decommissioning progress, it will hang forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message