Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3CC2A200C6F for ; Tue, 25 Apr 2017 04:02:42 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3B5E3160BA5; Tue, 25 Apr 2017 02:02:42 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 82C87160B99 for ; Tue, 25 Apr 2017 04:02:41 +0200 (CEST) Received: (qmail 40104 invoked by uid 500); 25 Apr 2017 02:02:40 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 40087 invoked by uid 99); 25 Apr 2017 02:02:39 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Apr 2017 02:02:39 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 3189B1B0640 for ; Tue, 25 Apr 2017 02:02:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 4ypQoVXYbhe5 for ; Tue, 25 Apr 2017 02:02:38 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 3414F61DAE for ; Tue, 25 Apr 2017 02:02:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7E409E093E for ; Tue, 25 Apr 2017 02:02:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 34E3021B55 for ; Tue, 25 Apr 2017 02:02:04 +0000 (UTC) Date: Tue, 25 Apr 2017 02:02:04 +0000 (UTC) From: "Konstantin Shvachko (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 25 Apr 2017 02:02:42 -0000 [ https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-11384: --------------------------------------- Attachment: HDFS-11384.008.patch # Took some time to reproduce failures. I did not have any on my local box. Looks like the solution is to mock FSNamesystem before starting DataNodes. Otherwise the behavior is non-deterministic. I changed it and now it runs consistently on my local box. Let's try Jenkins. # findbugs warnings are not related to the patch. # There are 2 checkstyle warnings. #* One complains that the number of parameters in doTest() is more than 7. Don't know why the magical number, but there was 8 parameters in doTest() already and I added one. #* Second is about inner assignment, which is intentional in this case, because I want the two variables initially have the same value, and splitting the line into two statements would remove that meaning. > Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike > ------------------------------------------------------------------------------------------------- > > Key: HDFS-11384 > URL: https://issues.apache.org/jira/browse/HDFS-11384 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover > Affects Versions: 2.7.3 > Reporter: yunjiong zhao > Assignee: Konstantin Shvachko > Attachments: balancer.day.png, balancer.week.png, HDFS-11384.001.patch, HDFS-11384.002.patch, HDFS-11384.003.patch, HDFS-11384.004.patch, HDFS-11384.005.patch, HDFS-11384.006.patch, HDFS-11384-007.patch, HDFS-11384.008.patch > > > When running balancer on hadoop cluster which have more than 3000 Datanodes will cause NameNode's rpc.CallQueueLength spike. We observed this situation could cause Hbase cluster failure due to RegionServer's WAL timeout. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org