hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-782) dynamic replication
Date Sat, 21 Nov 2009 01:39:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780872#action_12780872

Ning Zhang commented on HDFS-782:

To elaborate on the proposal, a data node keeps the statistics on how many clients are requesting
a certain block. If the number exceeds a certain threshold, the data node can send the block
to a number of data nodes (children) and ask them to replicate the block (one heuristics is
to choose from the data nodes whose asked for the block). If a child data node accepts the
replication request (e.g., it doesn't hold already), it goes through the same protocol as
adding a new replica acknowledged by the name node. The reason we propose datanode->datanode
replication rather than datanode->namenode->datanode replication is that it is much
faster for the former case than the latter (whose performance depending on the work load of
the name node could be minutes). If the children also got too many requests, they can proactively
replicate themselves recursively, until the # of requests are distributed to sufficient number
of replicas. 

Currently the name node cleans up the extra replicas periodically. To address DN->DN dynamic
replication, we need to add a heuristic to let it clean extra replicas only when they has
not been access in a certain period. 

Any suggests?

> dynamic replication
> -------------------
>                 Key: HDFS-782
>                 URL: https://issues.apache.org/jira/browse/HDFS-782
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Ning Zhang
> In a large and busy cluster, a block can be requested by many clients at the same time.
HDFS-767 tries to solve the failing case when the # of retries exceeds the maximum # of retries.
However, that patch doesn't solve the performance issue since all failing clients have to
wait a certain period before retry, and the # of retries could be high. 
> One solution to solve the performance issue is to increase the # of replicas for this
"hot" block dynamically when it is requested many times at a short period. The name node need
to be aware such situation and only clean up extra replicas when they are not accessed recently.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message