Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4A975200C7D for ; Tue, 16 May 2017 15:12:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 49190160B9D; Tue, 16 May 2017 13:12:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8FA7C160BAC for ; Tue, 16 May 2017 15:12:10 +0200 (CEST) Received: (qmail 5333 invoked by uid 500); 16 May 2017 13:12:08 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 5114 invoked by uid 99); 16 May 2017 13:12:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 May 2017 13:12:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E5A6C1A07ED for ; Tue, 16 May 2017 13:12:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Xh6qomAX9NgX for ; Tue, 16 May 2017 13:12:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 81EDA5FD6D for ; Tue, 16 May 2017 13:12:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id D7A31E0D50 for ; Tue, 16 May 2017 13:12:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id C80EA243BA for ; Tue, 16 May 2017 13:12:04 +0000 (UTC) Date: Tue, 16 May 2017 13:12:04 +0000 (UTC) From: "Jason Lowe (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HADOOP-14412) HostsFileReader#getHostDetails is very expensive on large clusters MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 16 May 2017 13:12:11 -0000 [ https://issues.apache.org/jira/browse/HADOOP-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-14412: -------------------------------- Attachment: HADOOP-14412-branch-2.002.patch Uploading the same branch-2 patch again to trigger a Jenkins run. > HostsFileReader#getHostDetails is very expensive on large clusters > ------------------------------------------------------------------ > > Key: HADOOP-14412 > URL: https://issues.apache.org/jira/browse/HADOOP-14412 > Project: Hadoop Common > Issue Type: Bug > Components: util > Affects Versions: 2.8.0 > Reporter: Jason Lowe > Assignee: Jason Lowe > Attachments: HADOOP-14412.001.patch, HADOOP-14412.002.patch, HADOOP-14412-branch-2.001.patch, HADOOP-14412-branch-2.002.patch, HADOOP-14412-branch-2.002.patch, HADOOP-14412-branch-2.8.002.patch > > > After upgrading one of our large clusters to 2.8 we noticed many IPC server threads of the resourcemanager spending time in NodesListManager#isValidNode which in turn was calling HostsFileReader#getHostDetails. The latter is creating complete copies of the include and exclude sets for every node heartbeat, and these sets are not small due to the size of the cluster. These copies are causing multiple resizes of the underlying HashSets being filled and creating lots of garbage. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org