Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 166F0200C39 for ; Wed, 1 Mar 2017 21:17:50 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 15225160B56; Wed, 1 Mar 2017 20:17:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 64822160B70 for ; Wed, 1 Mar 2017 21:17:49 +0100 (CET) Received: (qmail 35901 invoked by uid 500); 1 Mar 2017 20:17:48 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 35756 invoked by uid 99); 1 Mar 2017 20:17:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Mar 2017 20:17:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 01805189511 for ; Wed, 1 Mar 2017 20:17:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.547 X-Spam-Level: X-Spam-Status: No, score=-1.547 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id X2LmANywxx-l for ; Wed, 1 Mar 2017 20:17:47 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id A1DC35FBE2 for ; Wed, 1 Mar 2017 20:17:46 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id D4768E04FE for ; Wed, 1 Mar 2017 20:17:45 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 901AE2415C for ; Wed, 1 Mar 2017 20:17:45 +0000 (UTC) Date: Wed, 1 Mar 2017 20:17:45 +0000 (UTC) From: "Benoy Antony (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 01 Mar 2017 20:17:50 -0000 [ https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890957#comment-15890957 ] Benoy Antony commented on HDFS-11384: ------------------------------------- Sleeping inside the *Synchronized* block should be avoided as it will lock prevent other threads from obtaining the lock while the thread is sleeping. One tradeoff in sleeping fixed vs variable time is that code gets complicated. Since by default, the delay is not applied, it is okay to sleep for a fixed interval after getBlocks(). > Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike > ------------------------------------------------------------------------------------------------- > > Key: HDFS-11384 > URL: https://issues.apache.org/jira/browse/HDFS-11384 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover > Affects Versions: 2.7.3 > Reporter: yunjiong zhao > Assignee: yunjiong zhao > Attachments: balancer.day.png, balancer.week.png, HDFS-11384.001.patch > > > When running balancer on hadoop cluster which have more than 3000 Datanodes will cause NameNode's rpc.CallQueueLength spike. We observed this situation could cause Hbase cluster failure due to RegionServer's WAL timeout. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org