hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1912) Datanode should support block replacement
Date Sat, 20 Oct 2007 01:29:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536388

Raghu Angadi commented on HADOOP-1912:

Pretty much looks fine.

# I could not find throttling test. 
# Regd throttler : each connection is individually throttled. I think ideally we should use
one throttler that is used by all connections. This will make sure we use up allowed b/w when
ever possible. In the current scheme, transfer rate betwen A & B can not use extra b/w
if another connection between B & C cannot use its quota (because C has many connections).
Also when throttler is shared, small  blocks can not escape below the the radar. 
## Please make ThrottlerBase package private so that it can be used by HADOOP-2012
# minor : in FSNamesystem.java : {code}
        if( priSet.contains(delNodeHint)) {
          cur = delNodeHint;
        } else if(addedNode != null && !priSet.contains(addedNode)){
          cur = delNodeHint;
/// Can be replaced by
       if (   addedNode != null || priSet.contains(delNodeHint) ) {
          cur = delNodeHint;
# minor : it increases allocation in addBlock() in FSNameSystem.java. Is the current implementation
more correct?

> Datanode should support block replacement
> -----------------------------------------
>                 Key: HADOOP-1912
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1912
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.14.1
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: replace.patch, replace1.patch, replace2.patch, replace3.patch
> This jira Data Node's support for rebalancing (HADOOP-1652). When a balancer decides
to move a block B from Source S to Destination D. It also chooses a proxy source PS, which
contains a replica of B, to speed up block copy.  The block placement is carried in the following
> 1. A block copy command is sent to datanode PS in the format of  "OP_BLOCK_COPY <block_id_of_B>
<source S> <destination D>". It requests PS to copy B to datanode D.
> 2. PS then transfers block B to datanode D with a block replacement command to D in the
format of "OP_BLOCK_REPLACEMENT <block_id_of_B> <source S> <data_of_B>".

> 3. Datanode D writes the block B to its disk and then sends a name node a blockReceived
RPC informing the namenode that a block B is received and please delete a replica of B from
source S if there is any excessive replica.
> 4. The namenode then adds datanode D to block B's map and removes an exesive replicas
of B in favor of datanode S.
> In addition, each data node has a limited bandwidth for rebalancing. The default value
for the bandwidth is 5MB/s. Throttling is done at both source & destination sides. Each
data node limits maximum number of concurrent data transfers (including both sending and receiving)
for the rebalancing purpose to be 5. In the worst case, each data transfer has a limited bandwidth
of 1MB/s. Each sender & receiver has a Throttler. The primary method of the class is "throttle(
int numOfBytes )". The parameter numOfBytes indicates the total number of bytes that the caller
has sent or received since the last throttle is called. The method calculates the caller's
I/O rate. If the rate is faster than the bandwidth limit, it sleeps to slow down the data
transfer. After it wakes up, it adjusts its bandwidth limit if the number of concurrent data
transfers is changed. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message