hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Roberts (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit
Date Mon, 13 Apr 2015 20:33:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493020#comment-14493020
] 

Nathan Roberts commented on YARN-3388:
--------------------------------------

Thanks [~leftnoteasy] for the comments. 

{quote}
when doing allocation under a labeled node, user-limit checking in the patch is incorrect.
{quote}
I don't think it's any more incorrect than it was prior to the patch. Both trunk and this
patch use queueUsage.getUsed() to calculate currentCapacity. iiuc, this is wrong when looking
at labeled nodes. Trunk is also using the partition from the resource request and not the
partition from the node being evaluated, which I think is also incorrect.  I think it's more
correct after YARN-3361 but that's not there yet. 

I don't think I made things any worse than trunk is today, but I can wait until YARN-3361
is in if that will make things easier. 

I can change the name to include Dominant.

The test case you mention should be in there. Without the fix the following assert will fail
because we can't get above the queue's capacity of 80%
{code}
    assertTrue(
        "Exepected AbsoluteUsedCapacity > 0.95, got: "
            + b.getAbsoluteUsedCapacity(), b.getAbsoluteUsedCapacity() > 0.95);

{code}




> Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when
computing user-limit
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3388
>                 URL: https://issues.apache.org/jira/browse/YARN-3388
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.6.0
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>         Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch
>
>
> When there are multiple active users in a queue, it should be possible for those users
to make use of capacity up-to max_capacity (or close). The resources should be fairly distributed
among the active users in the queue. This works pretty well when there is a single resource
being scheduled.   However, when there are multiple resources the situation gets more complex
and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message