hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Roberts (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2537) re-replicating under replicated blocks should be more dynamic
Date Thu, 03 Nov 2011 21:01:33 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143525#comment-13143525
] 

Nathan Roberts commented on HDFS-2537:
--------------------------------------



IIRC, the basic algorithm for this is:
{code}
Every dfs.replication-interval (default 3 seconds)
   Choose 2 * nr_live_nodes under-replicated blocks as datanode work
   Prune this list by making sure no datanode has more than dfs.max-repl-streams (default
2) active
Work is distributed to datanodes as they heartbeat in
{code}

Seems like more responsibility needs to be given to the datanodes to properly pace this activity.
The datanode probably needs to be told how much resource it can use for the activity, adjusted
during heartbeats (sort of like how balancer bandwidth is adjusted). The namenode also has
to be able to give datanodes more work during each individual heartbeat so it can re-replicate
at the prescribed pace. 

I think it would also be beneficial if the re-replication work could be scheduled more as
a coordinated piece of work instead of a slew of individual blocks that need re-replicating.
For example, if you have a 1000 node cluster, each node with 100,000 blocks, then ideally
each node would only need to connect to one other node and then stream its share of the under-replicated
blocks to one other node (one isn't exactly precise due to replication rules, but you get
the point.) Instead of 100,000 random block re-replications, the work is coordinated by what
could be moved from node A to node B while still obeying replication rules. This would allow
node A to do almost all of its work with exactly 1 other node (possibly on-rack), which would
then allow for other efficiencies (connection caching, single stream of blocks, etc.)
                
> re-replicating under replicated blocks should be more dynamic
> -------------------------------------------------------------
>
>                 Key: HDFS-2537
>                 URL: https://issues.apache.org/jira/browse/HDFS-2537
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Nathan Roberts
>
> When a node fails or is decommissioned, a large number of blocks become under-replicated.
Since re-replication work is distributed, the hope would be that all blocks could be restored
to their desired replication factor in very short order. This doesn't happen though because
the load the cluster is willing to devote to this activity is mostly static (controlled by
configuration variables). Since it's mostly static, the rate has to be set conservatively
to avoid overloading the cluster with replication work.
> This problem is especially noticeable when you have lots of small blocks. It can take
many hours to re-replicate the blocks that were on a node while the cluster is mostly idle.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message