hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-11015) Enforce timeout in balancer
Date Tue, 25 Oct 2016 17:21:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605901#comment-15605901

Zhe Zhang edited comment on HDFS-11015 at 10/25/16 5:21 PM:

+1 on the updated patch. Thanks [~kihwal].

I just committed the patch to trunk. If you are OK with it I also plan to commit to branch-2
~ branch-2.7. I also created HDFS-11051 to enhance testing around Balancer slow block moves.

was (Author: zhz):
+1 on the updated patch. Thanks Kihwal.

> Enforce timeout in balancer
> ---------------------------
>                 Key: HDFS-11015
>                 URL: https://issues.apache.org/jira/browse/HDFS-11015
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>         Attachments: HDFS-11015-1.patch, HDFS-11015-2.patch, HDFS-11015-3.patch, balancer.png
> 1) Hung node detection: HDFS-6247 has removed the socket read timeout while adding the
periodic response for slow block moves. However, the removal of the long timeout wasn't necessary.
 The timeout is still useful for avoiding hung nodes and does not abort slow moves.
> 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to be enforced,
but it is not. An iteration can easily stretch to 30 to 40 minutes with a long tail. Because
of the long tails, the balancer throughput does not reach its full potential.
> 3) Slow move detection: For improved throughput, imposing block move timeout is sometimes
necessary.  We have seen an iteration taking over 2 hours because of one slow block move.
 This is mainly for catching exceptionally slow moves.  Even if the balancer stops waiting,
the move will continue and finish.
> In order to not undo what  HDFS-6247 tried to achieve, it should be possible to configure
off 3).

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message