hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2537) re-replicating under replicated blocks should be more dynamic
Date Sun, 13 Nov 2011 18:17:52 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149325#comment-13149325
] 

Uma Maheswara Rao G commented on HDFS-2537:
-------------------------------------------

*Dynamic Re-replication Analysis:*
      Considering average block size as *64 MB*, producing the replication work at the rate
of 2*num_livenodes is not a low work generation. The main concern should be the processing
of the blocks from replicateQueues in each DataNode descriptor for the heartbeats. Currently
the consuming rate will be mainly depending on the dfs.namenode.replication.max-streams .
 Deafult value is 2, so at any point of time only two threads will work on DN to process the
replication work.
  Here is one simple solution to accelerate the work depending on DN loads: 
         We should have below config items
*dfs.namenode.replication.max-streams* represents, maximum datanodes threads to work on replication
work. For now consider default value as 20.
*dfs.namenode.replication.min-streams* represents, minimum datanode threads to work on replication
work. Consider default value as 2.
   For creating the DNA_TRANSFER command, we need to find the replication blocks list from
corresponding DN descriptor. Here initially we need to find the SRC DN capacity.
  Below are the conditions to find the capacity:
1)	If SRC DN dataXceiver count is 20%, then 80% of dfs.namenode.replication.max-streams can
work on replication.
2)	If SRC DN dataXceiver count is 0-5%, then 95% ( keeping 5% for tolarence) of dfs.namenode.replication.max-streams
should work on replication.
3)	If SRC DN dataXceiver count is 95% (keeping 5% for tolarence), then dfs.namenode.replication.min-streams
can work on replication.
4)	If SRC DN dataXceiver count is any other %, then capacity will be calculated as option
1.

 Once finding the SRC DN capacity, we need to find the blocks & targets to full fill the
capacity of SRC DN.  
Targets also should have the capacity to work on replication task. If any block targets does
not have the capacity, then that particular block should not selected for this heartbeat.
Collect all such unselected blocks and place in front of the queue for checking these blocks
immediately in next heartbeat. So the replication processing work will move between *dfs.namenode.replication.min-streams*
and *dfs.namenode.replication.max-streams*
Below is the drawback with this solution:
If the NN generates the work at earliest and choosing the targets, can consider only the current
load at DNs.
The problem here is, there is a high chance of choosing the same targets as choosen for previous
block. Because the previous targets future work will not get replicated in terms of dataxceiver
count immediately in DN. Work distribution also will not happen properly considering the load
on DNs.
This can be solved by tracking the estimated future work and consider that future load while
choosing the next target for the replication. And also DN should send the blocks reports which
are in-progress in DataTransfer thread. Because DataTransfer threads will write blocks with
WRITE op code.
So, this block write also will increment the xceiver count in targets. Since DN already incrementing
this count, NN should decrement this count from future load.

 This will not work in *Federation* case, because one NN will not know about other NNs future
replication load calculations for DNs.
So, still the above problem arises with Federation. 
Considering Federation, still need to think more to find the better solutions..

                
> re-replicating under replicated blocks should be more dynamic
> -------------------------------------------------------------
>
>                 Key: HDFS-2537
>                 URL: https://issues.apache.org/jira/browse/HDFS-2537
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Nathan Roberts
>
> When a node fails or is decommissioned, a large number of blocks become under-replicated.
Since re-replication work is distributed, the hope would be that all blocks could be restored
to their desired replication factor in very short order. This doesn't happen though because
the load the cluster is willing to devote to this activity is mostly static (controlled by
configuration variables). Since it's mostly static, the rate has to be set conservatively
to avoid overloading the cluster with replication work.
> This problem is especially noticeable when you have lots of small blocks. It can take
many hours to re-replicate the blocks that were on a node while the cluster is mostly idle.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message