Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CFA1810DFF for ; Wed, 5 Jun 2013 10:31:27 +0000 (UTC) Received: (qmail 49327 invoked by uid 500); 5 Jun 2013 10:31:27 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 49196 invoked by uid 500); 5 Jun 2013 10:31:26 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 49136 invoked by uid 99); 5 Jun 2013 10:31:23 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Jun 2013 10:31:23 +0000 Date: Wed, 5 Jun 2013 10:31:23 +0000 (UTC) From: "nemon lou (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675778#comment-13675778 ] nemon lou commented on YARN-276: -------------------------------- Hi Thomas,thank you for your review. The "used resource not showing up" issue seems like a bug that already exists.i will fire another jira for it.(Resource.java's toString() method uses symbol "<>",which is ignored by explorers) The "divide by zero exception" problem has not been fixed as i haven't find which piece of code can cause it. Other review comments will been accepted in latest patch.Thanks. After reconsidering user limit, i find property "maxAMResourcePerQueuePerUserPercent" added by me is not a proper one.It will be removed and checking maxAMResourcePerQueue for each user instead. > Capacity Scheduler can hang when submit many jobs concurrently > -------------------------------------------------------------- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 3.0.0, 2.0.1-alpha > Reporter: nemon lou > Assignee: nemon lou > Labels: incompatible > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira