hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wilfred Spiegelenburg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7534) Fair scheduler assign resources may exceed maxResources
Date Mon, 20 Nov 2017 09:13:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258977#comment-16258977
] 

Wilfred Spiegelenburg commented on YARN-7534:
---------------------------------------------

I would like to work on this one if you don't mind

I think two things are getting mixed up: the queue used resources are not linked to the node.
It is the sum of all the resources of containers from applications that run in a queue. A
node heartbeat with a changed usage does not mean that the usage changed because an application
in the queue has changed it. It could have changed due to a different queue/application adding
a container.

We're also not allocating anything just yet and have thus not gone over. When the application
is updated, at a later point in time, that is when we do that check. We just have a preliminary
check here to see if we can offer this node to the queue. Another point to take into account:
we are not checking what the application asked for here. That is the next step that follows
just below when we run over all the applications that have a demand:

{code}
    for (FSAppAttempt sched : fetchAppsWithDemand(true)) {
      if (SchedulerAppUtils.isPlaceBlacklisted(sched, node, LOG)) {
        continue;
      }
      assigned = sched.assignContainer(node);
{code}

This is the earliest we can find what the ask is. If there are more applications with a demand
for the queue we walk over the list. We call [assignContainer |https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L830]
and that is where the checks happen.
One of the checks we perform is in hasContainerForNode for the FSAppAttempt:
{code}
    } else if (!getQueue().fitsInMaxShare(resource)) {
      // The requested container must fit in queue maximum share
      updateAMDiagnosticMsg(resource,
          " exceeds current queue or its parents maximum resource allowed).");

      ret = false;
{code}

Which makes the allocation fail and thus we drop out and check the next request for the application
and if that all fails we check the next application in the list from apps with demand.

Do you have any logs that show that this is not working as it should?


> Fair scheduler assign resources may exceed maxResources
> -------------------------------------------------------
>
>                 Key: YARN-7534
>                 URL: https://issues.apache.org/jira/browse/YARN-7534
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>            Reporter: YunFan Zhou
>
> The logic we're scheduling now is to check whether the resources used by the queue has
exceeded *maxResources* before assigning the container. This will leads to the fact that after
assigning this container the queue uses more resources than *maxResources*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message