hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11742) Improve balancer usability after HDFS-8818
Date Mon, 05 Jun 2017 15:05:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037072#comment-16037072
] 

Kihwal Lee commented on HDFS-11742:
-----------------------------------

!https://issues.apache.org/jira/secure/attachment/12871245/balancer_fix.png!

[~shv], here is a graph. 

HDFS-8188 + HDFS-11377 potentially move blocks faster, if the number of mover thread is jacked
up very high. "High" is relative to the size of cluster and subject to the nature of imbalance.
In one (~2500 node) of our clusters, setting it to 10,000 wasn't enough. The balancer does
create 10,000 threads while only subset of them are utilized. Nicholas previously suggested
30,000 and  while that would have "worked", it effectively means HDFS-8188 requires the mover
threads limit to be removed.

What I did here is to honor the configured mover thread limit (default=1,000) and size a thread
pool accordingly (#movers / #targets) instead of using a fixed number (default max=50).  I've
verified it works as good as, and sometimes better than 2.7 balancer with the identical config.

> Improve balancer usability after HDFS-8818
> ------------------------------------------
>
>                 Key: HDFS-11742
>                 URL: https://issues.apache.org/jira/browse/HDFS-11742
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Blocker
>              Labels: release-blocker
>         Attachments: balancer2.8.png, balancer_fix.png, HDFS-11742.branch-2.8.patch,
HDFS-11742.branch-2.patch, HDFS-11742.trunk.patch, HDFS-11742.v2.trunk.patch
>
>
> We ran 2.8 balancer with HDFS-8818 on a 280-node and a 2,400-node cluster. In both cases,
it would hang forever after two iterations. The two iterations were also moving things at
a significantly lower rate. The hang itself is fixed by HDFS-11377, but the design limitation
remains, so the balancer throughput ends up actually lower.
> Instead of reverting HDFS-8188 as originally suggested, I am making a small change to
make it less error prone and more usable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message