hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2171) Changes to balancer bandwidth should not require datanode restart.
Date Tue, 26 Jul 2011 17:45:09 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071242#comment-13071242

Eric Payne commented on HDFS-2171:

Unit Tests: I added ./src/test/org/apache/hadoop/hdfs/TestBalancerBandwidth.java. Unit tests
1) Prior to starting daemons, set dfs.balance.bandwidthPerSec to 1M (1048576)

Manual Tests: The following tests passed:
2) Set up a cluster as follows:
  On different hosts, start the following daemons:
    1 JT
    1 NN
    1 SNN
    1 DN/TT

3) On client, run randomwriter utility to fill up the DN
$  $HADOOP_HOME/bin/hadoop --config $HADOOP_CONF_DIR jar $HADOOP_HOME/hadoop-examples.jar
randomwriter input_$(date +%s)

4) Stat 3 more DN/TT daemons on separate hosts

5) On NN, run balancer:
$ sudo -u hdfs bash -c "export HADOOP_HOME=$HADOOP_HOME; export HADOOP_CONF_DIR=$HADOOP_CONF_DIR;
$HADOOP_HOME/bin/hadoop --config $HADOOP_CONF_DIR balancer -threshold 10"

6) Watch network bandwidth on DN that was started in step 2). TX bandwidth should not go much
above 1M bps. Note that it will be a little higher, but only because there are additional
communications happening on that host besides balancing.

7) Watch network bandwidth on other 3 DN hosts. RX bandwidth should not exceed 1M bps (by

8) On client as hdfs user, change balancer bandwidth value to 20M:
$ $HADOOP_HOME/bin/hadoop --config $HADOOP_CONF_DIR dfsadmin -setBalancerBandwidth 20971520

9) Watch network bandwidth on nodes. TX and RX network bandwidth should not exceed 20M bps
(by much).

> Changes to balancer bandwidth should not require datanode restart.
> ------------------------------------------------------------------
>                 Key: HDFS-2171
>                 URL: https://issues.apache.org/jira/browse/HDFS-2171
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer, data-node
>    Affects Versions:
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>             Fix For:
>         Attachments: HDFS-2171.patch
> Currently in order to change the value of the balancer bandwidth (dfs.datanode.balance.bandwidthPerSec),
the datanode daemon must be restarted.
> The optimal value of the bandwidthPerSec parameter is not always (almost never) known
at the time of cluster startup, but only once a new node is placed in the cluster and balancing
is begun. If the balancing is taking too long (bandwidthPerSec is too low) or the balancing
is taking up too much bandwidth (bandwidthPerSec is too high), the cluster must go into a
"maintenance window" where it is unusable while all of the datanodes are bounced. In large
clusters of thousands of nodes, this can be a real maintenance problem because these "mainenance
windows" can take a long time and there may have to be several of them while the bandwidthPerSec
is experimented with and tuned.
> A possible solution to this problem would be to add a -bandwidth parameter to the balancer
tool. If bandwidth is supplied, pass the value to the datanodes via the OP_REPLACE_BLOCK and
OP_COPY_BLOCK DataTransferProtocol requests. This would make it necessary, however, to change
the DataTransferProtocol version.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message