hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11742) Improve balancer usability after HDFS-8188
Date Fri, 05 May 2017 16:35:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998534#comment-15998534
] 

Kihwal Lee commented on HDFS-11742:
-----------------------------------

Instead of reverting, I am making a simple change to make it more usable.  This will prevent
users from hitting the same issues we had.  The changes from HDFS-8188 does allow running
balancer at a higher throughput, but it needs turning multiple knobs to get there.  And when
it is running slower than the previous release, users will have no clue why it is so. The
default config values may result in degraded performance for users running a cluster with
more than 20 nodes.

The main problem of HDFS-8188 is the way thread pool is created per target.  If it reaches
the limit (max mover threads), the remaining pending moves are simply dropped (Or even worse,
it hangs without HDFS-11377), leading to degraded performance as demonstrated above with graphs.
 The suggested workaround of "set the mover thread limit to 10,000 or 30,000" simply means
removing the limit. i.e. it cannot work with the limit.

The suggested improvement calculates the size of each mover thread pool, instead of using
the configured fixed value.  The total thread count limit is honored without causing the degradation
seen with the original design. 


> Improve balancer usability after HDFS-8188
> ------------------------------------------
>
>                 Key: HDFS-11742
>                 URL: https://issues.apache.org/jira/browse/HDFS-11742
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Blocker
>         Attachments: balancer2.8.png, HDFS-11742.branch-2.8.patch, HDFS-11742.branch-2.patch,
HDFS-11742.trunk.patch
>
>
> We ran 2.8 balancer with HDFS-8818 on a 280-node and a 2,400-node cluster. In both cases,
it would hang forever after two iterations. The two iterations were also moving things at
a significantly lower rate. The hang itself is fixed by HDFS-11377, but the design limitation
remains, so the balancer throughput ends up actually lower.
> Instead of reverting HDFS-8188 as originally suggested, I am making a small change to
make it less error prone and more usable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message