Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7C596200C81 for ; Fri, 12 May 2017 00:11:09 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 7AF39160BC7; Thu, 11 May 2017 22:11:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C13AC160BCA for ; Fri, 12 May 2017 00:11:08 +0200 (CEST) Received: (qmail 86492 invoked by uid 500); 11 May 2017 22:11:08 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 86478 invoked by uid 99); 11 May 2017 22:11:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 May 2017 22:11:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 8503D188A5E for ; Thu, 11 May 2017 22:11:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id TXauERiOXWYm for ; Thu, 11 May 2017 22:11:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 0027860EE6 for ; Thu, 11 May 2017 22:11:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 2A1D4E0D6A for ; Thu, 11 May 2017 22:11:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 52B9C21E1A for ; Thu, 11 May 2017 22:11:04 +0000 (UTC) Date: Thu, 11 May 2017 22:11:04 +0000 (UTC) From: "Jason Lowe (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 11 May 2017 22:11:09 -0000 [ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007289#comment-16007289 ] Jason Lowe commented on YARN-4002: ---------------------------------- FYI we're finding this change to be quite expensive on large clusters. See HADOOP-14412. > make ResourceTrackerService.nodeHeartbeat more concurrent > --------------------------------------------------------- > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Hong Zhiguo > Assignee: Hong Zhiguo > Priority: Critical > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, YARN-4002-rwlock.patch, YARN-4002-rwlock-v2.patch, YARN-4002-rwlock-v2.patch, YARN-4002-rwlock-v3.patch, YARN-4002-rwlock-v3-rebase.patch, YARN-4002-rwlock-v4.patch, YARN-4002-rwlock-v5.patch, YARN-4002-rwlock-v6.patch, YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only updated on "refresh nodes". All RPC threads handling node heartbeats are only readers. So RWLock could be used to alow concurrent access by RPC threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are always updated by "reference assignment", which is atomic in Java, the reader side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org