hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maysam Yabandeh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1076) RM gets stuck with a reservation, ignoring new containers
Date Tue, 20 Aug 2013 04:07:52 GMT

    [ https://issues.apache.org/jira/browse/YARN-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13744683#comment-13744683
] 

Maysam Yabandeh commented on YARN-1076:
---------------------------------------

Hi [~ojoshi]. I am observing the problem with a unit test using MiniYarnCluster. The explanation
however is based solely on code walk through. I did not submit the test case since the problem
did not always show up--due to the non-determinism in MiniYarnCluster.

Anyway, I see that you have already covered that in the objectives of YARN-957:

| Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case
if the original request was made without any locality then scheduler should unreserve memory
on nm1 and allocate requested 2048MB container on nm2.

                
> RM gets stuck with a reservation, ignoring new containers
> ---------------------------------------------------------
>
>                 Key: YARN-1076
>                 URL: https://issues.apache.org/jira/browse/YARN-1076
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Maysam Yabandeh
>            Priority: Minor
>
> LeafQueue#assignContainers rejects newly available containers if #needContainers returns
false:
> {code:java}
>           if (!needContainers(application, priority, required)) {
>             continue;
>           }
> {code}
> When the application has already reserved all the required containers, #needContainers
returns false as long as no starvation is reported:
> {code:java}
> return (((starvation + requiredContainers) - reservedContainers) > 0);
> {code}
> where starvation is computed based on the attempts on re-reserving a resource. On the
other hand, a resource is re-reserved via #assignContainersOnNode only if it passed the #needContainers
precondition:
> {code:java}
>           // Do we need containers at this 'priority'?
>           if (!needContainers(application, priority, required)) {
>             continue;
>           }
>           //.
>           //.
>           //.
>           
>           // Try to schedule
>           CSAssignment assignment =  
>             assignContainersOnNode(clusterResource, node, application, priority, 
>                 null);
> {code}
> In other words, once needContainers returns false due to a reservation, it keeps rejecting
newly available resources, since no reservation is ever attempted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message