hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MENG DING (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4138) Roll back container resource allocation after resource increase token expires
Date Fri, 18 Dec 2015 19:03:47 GMT

     [ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

MENG DING updated YARN-4138:
----------------------------
    Attachment: YARN-4138.3.patch

Attach latest patch that addresses [~jianhe] and [~sandflee]'s comments.

I think the issue brought up by [~jianhe] is about race conditions between a normal resource
decrease and a resource rollback. The proposed fix is to guard resource rollback with the
same sequence of locks as the normal resource decrease, i.e., lock on application first, then
on scheduler.

So with the proposed fix, we can walk through the original example:
1. AM asks increase 2G -> 8G, and is approved by RM
2. AM does not increase the container, AM asks to decrease to 1G, and in the same time, increase
expiration logic is triggered:
* If the normal decrease is processed first: RM decrease 8G -> 1G (allocated and lastConfirmed
are now set to 1G), and then rollback is processed: RM rollback 1G -> 1G (skip)
* If rollback is processed first: RM rollback 8G -> 2G (allocated and lastConfirmed are
now set to 2G), and then normal decrease is processed: RM decrease 2G -> 1G


> Roll back container resource allocation after resource increase token expires
> -----------------------------------------------------------------------------
>
>                 Key: YARN-4138
>                 URL: https://issues.apache.org/jira/browse/YARN-4138
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, nodemanager, resourcemanager
>            Reporter: MENG DING
>            Assignee: MENG DING
>         Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch
>
>
> In YARN-1651, after container resource increase token expires, the running container
is killed.
> This ticket will change the behavior such that when a container resource increase token
expires, the resource allocation of the container will be reverted back to the value before
the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message