hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
Date Sat, 29 Mar 2014 12:18:15 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951867#comment-13951867

Yu Li commented on HDFS-6010:

Thanks for the review and comments Tsz.
I think "-datanodes" may be a better name than "-servers"...How about adding a new conf property,
say dfs.balancer.selectedDatanodes?
IMHO, by making it an option in CLI, user could dynamically choose which nodes to balance
among, while property is static. In our use case, the admin might balance groupA and groupB
separately, and an option in CLI would make it easier, right?
Agree to rename the option as "-datanodes" if we decided to still use option in CLI.

How about moving it to the balancer package and renaming it to BalancerUtil?
Agree to move it to balancer package. About the name, since currently it's only for validating
whether a given string matches a live datanode, it seems to me the name "BalancerUtil" is
too big. :-)

a balancer may run for a long time and some datanodes could be down. I think we should not
throw exceptions. Perhaps, printing a warning is good enough
It's true tat some datanodes could be down, but I'd like to discuss more about this scenario.
Assuming groupA has 3 nodes and node #1 is down. When admin issue command like "-datanodes
1,2,3", he means to make data distribution got balanced across the 3 nodes. If we only print
warnings, then it will balance data between node #2 and #3 firstly, then after node #1 is
back, the admin has to do another round of balancing. Since each balance would add read lock
to involved blocks and cause disk/network IO, in our product env we would prefer to fail the
first trial and wait until all datanodes back. So I'd like to ask for a second thought on
whether to throw exception or print warning here.

The new code could be moved to a static method (in BalancerUtil) so that it is earlier to
Agree, will refine the code no matter whether we need to change from throwing exception to
printing warning

I have not yet checked NodeStringValidator and the new tests in details
No problem, will wait for your comments and update the patch in one go, along with all changes
required after above discussion.

> Make balancer able to balance data among specified servers
> ----------------------------------------------------------
>                 Key: HDFS-6010
>                 URL: https://issues.apache.org/jira/browse/HDFS-6010
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer
>    Affects Versions: 2.3.0
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Minor
>              Labels: balancer
>         Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch
> Currently, the balancer tool balances data among all datanodes. However, in some particular
case, we would need to balance data only among specified nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.

This message was sent by Atlassian JIRA

View raw message