Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Mon, 13 Apr 2015 20:33:13 +0000 (UTC)
From: "Nathan Roberts (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12784835.1427122951000.74683.1428957193623@Atlassian.JIRA>
In-Reply-To: <JIRA.12784835.1427122951000@Atlassian.JIRA>
References: <JIRA.12784835.1427122951000@Atlassian.JIRA>
 <JIRA.12784835.1427122951884@arcas>
Subject: [jira] [Commented] (YARN-3388) Allocation in LeafQueue could get
 stuck because DRF calculator isn't well supported when computing user-limit
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493020#comment-14493020 ] 

Nathan Roberts commented on YARN-3388:
--------------------------------------

Thanks [~leftnoteasy] for the comments. 

{quote}
when doing allocation under a labeled node, user-limit checking in the patch is incorrect.
{quote}
I don't think it's any more incorrect than it was prior to the patch. Both trunk and this patch use queueUsage.getUsed() to calculate currentCapacity. iiuc, this is wrong when looking at labeled nodes. Trunk is also using the partition from the resource request and not the partition from the node being evaluated, which I think is also incorrect.  I think it's more correct after YARN-3361 but that's not there yet. 

I don't think I made things any worse than trunk is today, but I can wait until YARN-3361 is in if that will make things easier. 

I can change the name to include Dominant.

The test case you mention should be in there. Without the fix the following assert will fail because we can't get above the queue's capacity of 80%
{code}
    assertTrue(
        "Exepected AbsoluteUsedCapacity > 0.95, got: "
            + b.getAbsoluteUsedCapacity(), b.getAbsoluteUsedCapacity() > 0.95);

{code}


> Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3388
>                 URL: https://issues.apache.org/jira/browse/YARN-3388
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.6.0
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>         Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch
>
>
> When there are multiple active users in a queue, it should be possible for those users to make use of capacity up-to max_capacity (or close). The resources should be fairly distributed among the active users in the queue. This works pretty well when there is a single resource being scheduled.   However, when there are multiple resources the situation gets more complex and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)