hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo Nicholas Sze (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
Date Fri, 28 Mar 2014 23:20:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951561#comment-13951561

Tsz Wo Nicholas Sze commented on HDFS-6010:

The patch is generally good.  Some comments:
- I think "-datanodes" may be a better name than "-servers".  However, I actually suggest
not adding it as a CLI parameter since, for a large cluster, it may not be easy to specify
all the selected datanodes in CLI.  How about adding a new conf property, say dfs.balancer.selectedDatanodes?
- The new class NodeStringValidator is unlikely to be used outside Balancer.  How about moving
it to the balancer package and renaming it to BalancerUtil?
- In initNodes(..), if target == null, it will throw an IllegalArgumentException.  However,
a balancer may run for a long time and some datanodes could be down.  I think we should not
throw exceptions.  Perhaps, printing a warning is good enough.
-* The new code could be moved to a static method (in BalancerUtil) so that it is earlier
to read.

I have not yet checked NodeStringValidator and the new tests in details.

> Make balancer able to balance data among specified servers
> ----------------------------------------------------------
>                 Key: HDFS-6010
>                 URL: https://issues.apache.org/jira/browse/HDFS-6010
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer
>    Affects Versions: 2.3.0
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Minor
>              Labels: balancer
>         Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch
> Currently, the balancer tool balances data among all datanodes. However, in some particular
case, we would need to balance data only among specified nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.

This message was sent by Atlassian JIRA

View raw message