Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EAB391779F for ; Mon, 13 Apr 2015 20:33:13 +0000 (UTC) Received: (qmail 50103 invoked by uid 500); 13 Apr 2015 20:33:13 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 50057 invoked by uid 500); 13 Apr 2015 20:33:13 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 50045 invoked by uid 99); 13 Apr 2015 20:33:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Apr 2015 20:33:13 +0000 Date: Mon, 13 Apr 2015 20:33:13 +0000 (UTC) From: "Nathan Roberts (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493020#comment-14493020 ] Nathan Roberts commented on YARN-3388: -------------------------------------- Thanks [~leftnoteasy] for the comments. {quote} when doing allocation under a labeled node, user-limit checking in the patch is incorrect. {quote} I don't think it's any more incorrect than it was prior to the patch. Both trunk and this patch use queueUsage.getUsed() to calculate currentCapacity. iiuc, this is wrong when looking at labeled nodes. Trunk is also using the partition from the resource request and not the partition from the node being evaluated, which I think is also incorrect. I think it's more correct after YARN-3361 but that's not there yet. I don't think I made things any worse than trunk is today, but I can wait until YARN-3361 is in if that will make things easier. I can change the name to include Dominant. The test case you mention should be in there. Without the fix the following assert will fail because we can't get above the queue's capacity of 80% {code} assertTrue( "Exepected AbsoluteUsedCapacity > 0.95, got: " + b.getAbsoluteUsedCapacity(), b.getAbsoluteUsedCapacity() > 0.95); {code} > Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit > ------------------------------------------------------------------------------------------------------------- > > Key: YARN-3388 > URL: https://issues.apache.org/jira/browse/YARN-3388 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 2.6.0 > Reporter: Nathan Roberts > Assignee: Nathan Roberts > Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch > > > When there are multiple active users in a queue, it should be possible for those users to make use of capacity up-to max_capacity (or close). The resources should be fairly distributed among the active users in the queue. This works pretty well when there is a single resource being scheduled. However, when there are multiple resources the situation gets more complex and the current algorithm tends to get stuck at Capacity. > Example illustrated in subsequent comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)