hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MENG DING (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
Date Thu, 17 Sep 2015 15:56:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803122#comment-14803122
] 

MENG DING commented on YARN-4138:
---------------------------------

There is an issue with the current logic:

{code:title=RMContainerImpl.java}

+      if (!changeEvent.isIncrease()) {
+        // if this is a decrease request, if container was increased but not
+        // told to NM, we can consider previous increase is cancelled,
+        // unregister from the containerAllocationExpirer
+        container.containerAllocationExpirer.unregister(container
+            .getContainerId());
+      }  
{code}

Right now, if RM is processing a decrease request on a container, it (intends to) cancel any
ongoing increase action on the same container by removing the container from allocation expirer.
This is correct if the target resource is less than or equal to the last confirmed resource,
otherwise this will cause inconsistencies. For example:

1. A container is using 2G
2. AM requests to increase it from 2G --> 8G, and scheduler allocates it and issues token
to AM
3. AM never uses the token, but requests to decrease the container from 8G --> 6G, and
scheduler goes ahead and decrease the resource to 6G, and also removes the container from
allocation expirer
4. RM notifies NM to decrease resource to 6G, but since NM is still using 2G, the decrease
message is ignored by NM
5. Now the container has 6G allocation in RM, but 2G allocation in NM.

In this ticket, we will add a last confirmed resource to RMContainer, and I propose to only
unregister the container from expirer when the target resource is less than or equal to the
last confirmed resource. Use the above example, after the fix, the behavior should be:

1. A container is using 2G
2. AM requests to increase it from 2G --> 8G, and scheduler allocates it and issues token
to AM
3. AM requests to decrease the container from 8G --> 6G. Scheduler decreases it 6G, but
does *not* remove the container from allocation expirer
4. The increase token expires, and scheduler reverts back the container resource from 6G to
2G.

Let me know if this makes sense or not. If yes, I will come up with a patch shortly.

> Roll back container resource allocation after resource increase token expires
> -----------------------------------------------------------------------------
>
>                 Key: YARN-4138
>                 URL: https://issues.apache.org/jira/browse/YARN-4138
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, nodemanager, resourcemanager
>            Reporter: MENG DING
>            Assignee: MENG DING
>         Attachments: YARN-4138-YARN-1197.1.patch
>
>
> In YARN-1651, after container resource increase token expires, the running container
is killed.
> This ticket will change the behavior such that when a container resource increase token
expires, the resource allocation of the container will be reverted back to the value before
the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message