Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E228E17526 for ; Mon, 30 Mar 2015 18:12:02 +0000 (UTC) Received: (qmail 44851 invoked by uid 500); 30 Mar 2015 18:12:02 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 44811 invoked by uid 500); 30 Mar 2015 18:12:02 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 44799 invoked by uid 99); 30 Mar 2015 18:12:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Mar 2015 18:12:02 +0000 Date: Mon, 30 Mar 2015 18:12:02 +0000 (UTC) From: "Sunil G (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387114#comment-14387114 ] Sunil G commented on YARN-3136: ------------------------------- Hi [~jlowe] [~jianhe] {noformat} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler Field org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.applications Synchronized 90% of the time Unsynchronized access at AbstractYarnScheduler.java:[line 138] Unsynchronized access at AbstractYarnScheduler.java:[line 165] Unsynchronized access at AbstractYarnScheduler.java:[line 233] {noformat} As "applications" is now a concurrent version, I feel we do not need a lock. Kindly share your opinion. test case failure is not related. > getTransferredContainers can be a bottleneck during AM registration > ------------------------------------------------------------------- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler > Affects Versions: 2.6.0 > Reporter: Jason Lowe > Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, 0008-YARN-3136.patch, 0009-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)