hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-9287) Block placement completely fails if too many nodes are decommissioning
Date Thu, 05 Jan 2017 01:24:58 GMT

     [ https://issues.apache.org/jira/browse/HDFS-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Junping Du updated HDFS-9287:
    Fix Version/s:     (was: 2.8.0)

> Block placement completely fails if too many nodes are decommissioning
> ----------------------------------------------------------------------
>                 Key: HDFS-9287
>                 URL: https://issues.apache.org/jira/browse/HDFS-9287
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.0
>            Reporter: Daryn Sharp
>            Assignee: Kuhu Shukla
>            Priority: Critical
> The DatanodeManager coordinates with the HeartbeatManager to update HeartbeatManager.Stats
to track capacity and load.   This is crucial for block placement to consider space and load.
 It's completely broken for decomm nodes.
> The heartbeat manager substracts the prior values before it adds new values.  During
registration of a decomm node, it substracts before seeding the initial values.  This decrements
nodesInService, flips state to decomm, add will not increment nodesInService (correct).  There
are other math bugs (double adding) that accidentally work due to 0 values.
> The result is every decomm node decrements the node count used for block placement. 
When enough nodes are decomm, the replication monitor will silently stop working.  No logging.
 It searches all nodes and just gives up.  Eventually, all block allocation will also completely
fail.  No files can be created.  No jobs can be submitted.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message