Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 17 Jan 2017 22:17:26 +0000 (UTC)
From: "Andrew Wang (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13031631.1483440361000.33077.1484691446667@Atlassian.JIRA>
In-Reply-To: <JIRA.13031631.1483440361000@Atlassian.JIRA>
References: <JIRA.13031631.1483440361000@Atlassian.JIRA> <JIRA.13031631.1483440361885@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HDFS-11285) Dead DataNodes keep a long time in
 (Dead, DECOMMISSION_INPROGRESS), and never transition to (Dead,
 DECOMMISSIONED)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 17 Jan 2017 22:17:37 -0000


    [ https://issues.apache.org/jira/browse/HDFS-11285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826934#comment-15826934 ] 

Andrew Wang commented on HDFS-11285:
------------------------------------

Yea, this has been a long standing issue in HDFS. This typically happens because some app is slowly writing data to an HDFS file, like Flume or HBase. In these apps, there's typically some way of closing the current file and write to a new one.

Based on your output, the write pipeline has also been reduced to a single DN. We have some pipeline replacement policies that might help here, e.g. {{dfs.client.block.write.replace-datanode-on-failure.enable}} and {{dfs.client.block.write.replace-datanode-on-failure.policy}}.

Finally, finding the app that is writing this file can be difficult. A heavy handed method is looking at fsck -files -blocks -locations. I remember Kihwal was also working on a patch to dump the leases on the NN.

> Dead DataNodes keep a long time in (Dead, DECOMMISSION_INPROGRESS), and never transition to (Dead, DECOMMISSIONED)
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11285
>                 URL: https://issues.apache.org/jira/browse/HDFS-11285
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Lantao Jin
>         Attachments: DecomStatus.png
>
>
> We have seen the use case of decommissioning DataNodes that are already dead or unresponsive, and not expected to rejoin the cluster. In a large cluster, we met more than 100 nodes were dead, decommissioning and their {panel} Under replicated blocks {panel} {panel} Blocks with no live replicas {panel} were all ZERO. Actually It has been fixed in [HDFS-7374|https://issues.apache.org/jira/browse/HDFS-7374]. After that, we can refreshNode twice to eliminate this case. But, seems this patch missed after refactor[HDFS-7411|https://issues.apache.org/jira/browse/HDFS-7411]. We are using a Hadoop version based 2.7.1 and only below operations can transition the status from {panel} Dead, DECOMMISSION_INPROGRESS {panel} to {panel} Dead, DECOMMISSIONED {panel}:
> # Retire it from hdfs-exclude
> # refreshNodes
> # Re-add it to hdfs-exclude
> # refreshNodes
> So, why the code removed after refactor in the new DecommissionManager?
> {code:java}
> if (!node.isAlive) {
>   LOG.info("Dead node " + node + " is decommissioned immediately.");
>   node.setDecommissioned();
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org