hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Mollitor (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13157) Do Not Remove Blocks Sequentially During Decommission
Date Thu, 12 Sep 2019 18:58:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928827#comment-16928827

David Mollitor commented on HDFS-13157:


Doing a 'random queue' is very tricky.  It's always mathematically possible that some items
sits in the queue for all time since it's random, there is no guarantee that it will ever
be selected.

I am thinking something like this as an alternative:

# Mark each node as decommissioned
# Grab the lock
# Create small batches of blocks... rolling through the list of DataNodes, rolling through
the list of volumes (as proposed in this patch)
# Wrap each item in the batch into a `Future` and submit them into the queue
# Release the lock
# Wait for every `Future` in the batch to complete (with a timeout)
# Repeat until done

This would require the Replication Queue take a future, which is probably not a bad thing

> Do Not Remove Blocks Sequentially During Decommission 
> ------------------------------------------------------
>                 Key: HDFS-13157
>                 URL: https://issues.apache.org/jira/browse/HDFS-13157
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>    Affects Versions: 3.0.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>         Attachments: HDFS-13157.1.patch
> From what I understand of [DataNode decommissioning|https://github.com/apache/hadoop/blob/42a1c98597e6dba2e371510a6b2b6b1fb94e4090/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java] it
appears that all the blocks are scheduled for removal _in order._. I'm not 100% sure what
the ordering is exactly, but I think it loops through each data volume and schedules each
block to be replicated elsewhere. The net affect is that during a decommission, all of the
DataNode transfer threads slam on a single volume until it is cleaned out. At which point,
they all slam on the next volume, etc.
> Please randomize the block list so that there is a more even distribution across all
volumes when decommissioning a node.

This message was sent by Atlassian Jira

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message