hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11742) Revert the core changes from HDFS-8818
Date Tue, 02 May 2017 23:25:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994001#comment-15994001

Kihwal Lee commented on HDFS-11742:


I ran Balancer with the suggested revert and then with HDFS-11377.  I won't bother to post
plots for pre-HDFS-11377. It barely registers.  As you can see it is still not as good as
revert.  Sure it might work well for certain cases, but clearly performs poorly on all the
cluster we have tried.   If 2.8.1 is put up for vote with this, I will have to -1 the release.

bq. you may change it by setting dfs.datanode.balance.max.concurrent.moves.
It is not feasible to tune it per cluster. 

bq. What we need is HDFS-7639
I agree that dispatching needs to be asynchronous.  But, I don't see HDFS-8818 as a stepping
stone or prerequisite.  Since we are trying to release 2.8.1, I suggest HDFS-8818 be reverted
and the improvement be redesigned.

> Revert the core changes from HDFS-8818
> --------------------------------------
>                 Key: HDFS-11742
>                 URL: https://issues.apache.org/jira/browse/HDFS-11742
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Blocker
>         Attachments: balancer2.8.png, HDFS-11742.branch-2.8.patch, HDFS-11742.branch-2.patch,
> This is to revert the core changes made by HDFS-8818. The reason is explained in the
jira comments.  HDFS-8818 put in config and logging changes that are tied to the core change.
I will leave them as is.
> We ran 2.8 balancer with HDFS-8818 on a 280-node and a 2,400-node cluster. In both cases,
it would hang forever after two iterations. The two iterations were also moving things at
a significantly lower rate. The hang itself is fixed by HDFS-11377, but the design limitation
remains, so the balancer throughput ends up actually lower.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message