hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
Date Wed, 12 Mar 2014 17:38:49 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932055#comment-13932055

Devaraj Das commented on HDFS-6010:

[~carp84], sorry for the delay in getting back. You know how things work when there are deadlines
to meet :-)  I have some follow up questions for my understanding.

1. How would you maintain the mapping of files to groups? (for the HDFS-6012 to work). If
the mapping is maintained, wondering whether it makes sense to have the tool take paths for
balancing as opposed to servers. Then maybe you can also combine the tool that does group
management (HDFS-6012) into the balancer.
2. Are these mappings set up by some admin?
3. Would you expand a group when it is nearing capacity?
4. How does someone like HBase use this? Is HBase going to have visibility into the mappings
as well (to take care of HBASE-6721 and favored-nodes for writes)?
5. Would you need a higher level balancer for keeping the whole cluster balanced (do migrations
of blocks associated with certain paths from one group to another)? Otherwise, there would
be skews in the block distribution. 
6. When there is a failure of a datanode in a group, how would you choose which datanodes
to replicate the blocks to. The choice would be somewhat important given that some target
datanodes might be busy serving requests for apps for its group. Adding some more work to
these datanodes might make apps in the other group suffer. But maybe it's not that big a deal.
On the other hand, if the group still has capacity, and the failure zones are still intact
for the members in the group, then the replication could take into account the mapping in

> Make balancer able to balance data among specified servers
> ----------------------------------------------------------
>                 Key: HDFS-6010
>                 URL: https://issues.apache.org/jira/browse/HDFS-6010
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer
>    Affects Versions: 2.3.0
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Minor
>         Attachments: HDFS-6010-trunk.patch
> Currently, the balancer tool balances data among all datanodes. However, in some particular
case, we would need to balance data only among specified nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.

This message was sent by Atlassian JIRA

View raw message