hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11742) Improve balancer usability after HDFS-8188
Date Fri, 05 May 2017 23:24:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999105#comment-15999105

Kihwal Lee commented on HDFS-11742:

bq. I probably still not.
I invite everyone to run Balancer post-HDFS-8818 on their moderately sized clusters with the
existing or default settings.  This includes you, [~szetszwo].  I am not talking about quick
expansion type of balancing that you seem to focus on, but a steady-state balancing. 

bq. BTW, the replaceblockoperationspersec metrics you shown earlier. Is it just for one datanode?
Have you checked the other datanodes?
This is a aggregate of all nodes. 

bq.  I am not sure it is the right approach since the datanode pairs are sorted by priorities
according to the utilization and data locality. The patch tries to schedule the same number
of threads to all pairs.

The thread pool creation is per target, not per pair in HDFS-8818 and it tries to assign the
same fixed number of threads to each thread pool. Is it not?  This does not change in my patch.
I am simply adjusting the size of thread pool to not exceed the limit, thus avoiding the skipping
problem. Once there are skippings, the throughput can go down. 

> Improve balancer usability after HDFS-8188
> ------------------------------------------
>                 Key: HDFS-11742
>                 URL: https://issues.apache.org/jira/browse/HDFS-11742
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Blocker
>              Labels: release-blocker
>         Attachments: balancer2.8.png, HDFS-11742.branch-2.8.patch, HDFS-11742.branch-2.patch,
HDFS-11742.trunk.patch, HDFS-11742.v2.trunk.patch
> We ran 2.8 balancer with HDFS-8818 on a 280-node and a 2,400-node cluster. In both cases,
it would hang forever after two iterations. The two iterations were also moving things at
a significantly lower rate. The hang itself is fixed by HDFS-11377, but the design limitation
remains, so the balancer throughput ends up actually lower.
> Instead of reverting HDFS-8188 as originally suggested, I am making a small change to
make it less error prone and more usable.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message