Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6E0A3200BAD for ; Tue, 25 Oct 2016 20:23:00 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 6C8E6160AFA; Tue, 25 Oct 2016 18:23:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B248F160AD8 for ; Tue, 25 Oct 2016 20:22:59 +0200 (CEST) Received: (qmail 13658 invoked by uid 500); 25 Oct 2016 18:22:58 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 13636 invoked by uid 99); 25 Oct 2016 18:22:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Oct 2016 18:22:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id A4EAA2C1F5A for ; Tue, 25 Oct 2016 18:22:58 +0000 (UTC) Date: Tue, 25 Oct 2016 18:22:58 +0000 (UTC) From: "Zhe Zhang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11015) Enforce timeout in balancer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 25 Oct 2016 18:23:00 -0000 [ https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606056#comment-15606056 ] Zhe Zhang commented on HDFS-11015: ---------------------------------- Actually just noticed the original target version is 2.8. I committed to branch-2 and branch-2.8. Backporting to branch-2.7 is not very clean. I'm working on it. > Enforce timeout in balancer > --------------------------- > > Key: HDFS-11015 > URL: https://issues.apache.org/jira/browse/HDFS-11015 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: HDFS-11015-1.patch, HDFS-11015-2.patch, HDFS-11015-3.patch, balancer.png > > > 1) Hung node detection: HDFS-6247 has removed the socket read timeout while adding the periodic response for slow block moves. However, the removal of the long timeout wasn't necessary. The timeout is still useful for avoiding hung nodes and does not abort slow moves. > 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to be enforced, but it is not. An iteration can easily stretch to 30 to 40 minutes with a long tail. Because of the long tails, the balancer throughput does not reach its full potential. > 3) Slow move detection: For improved throughput, imposing block move timeout is sometimes necessary. We have seen an iteration taking over 2 hours because of one slow block move. This is mainly for catching exceptionally slow moves. Even if the balancer stops waiting, the move will continue and finish. > In order to not undo what HDFS-6247 tried to achieve, it should be possible to configure off 3). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org