hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Graves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
Date Wed, 15 Apr 2015 19:17:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496735#comment-14496735
] 

Thomas Graves commented on YARN-3434:
-------------------------------------

I am not saying child needs to know how parent calculate resource limit.  I am saying user
limit and whether it needs to unreserve to make another reservation has nothing to do with
the parent queue (ie it doesn't apply to parent queue).  Remember I'm not needing to store
user limit, I'm needing to store the fact of whether it needs to unreserve and if it does
how much does it need to unreserve.

When a node heartbeats it goes through the regular assignments and updates the leafQueue clusterResources
based on what the parent passes in. When a node is removed or added then it updates the resource
limits (none of these apply to calculation of whether it needs to unreserve or not). 

Basically it comes down to is this information useful outside of the small window between
when it calculates it and when its needed in assignContainer() and my thought is no.  And
you said it yourself in last bullet above.  Although we have been referring to the userLImit
and perhaps that is the problem.  I don't need to store the userLimit, I need to store whether
it needs to unreserve and if so how much.  Therefore it fits better as a local transient variable
rather then a globally stored one.  If you store just the userLImit then you need to recalculate
stuff which I'm trying to avoid.

I understand why we are storing the current information in ResourceLimits because it has to
do with headroom and parent limits and is recalculated at various points, but the current
implementation in canAssignToUser doesn't use headroom at all and whether we need to unreserve
or not on the last call to assignContainers doesn't affect the headroom calculation.

Again basically all we would be doing is placing an extra global variable(s) in the ResourceLimits
class just to pass it on down a couple of functions. That to me is a parameter.   Now if we
had multiple things needing this or updating it then to me fits better in the ResourceLimits.
 



> Interaction between reservations and userlimit can result in significant ULF violation
> --------------------------------------------------------------------------------------
>
>                 Key: YARN-3434
>                 URL: https://issues.apache.org/jira/browse/YARN-3434
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.6.0
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>         Attachments: YARN-3434.patch
>
>
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 containers, each
8G each, within about 5 seconds. I think this allowed the logic in assignToUser() to allow
the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message