hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
Date Fri, 08 May 2015 18:12:01 GMT

     [ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe updated YARN-2331:
-----------------------------
    Attachment: YARN-2331v3.patch

Updated patch to trunk.

bq. Probably, we could set the default value for yarn.nodemanager.recovery.supervised as true.
Normally, when people add a node as NM, they expect to use this node for a long time. So,
restart is expected ?

The problem is if the NM is not being supervised then when it goes down there isn't going
to be a timely restart.  That will leave containers unmanaged on the node (e.g.: can't be
killed by YARN since NM is down).  The user may eventually get around to restarting the NM,
but if that takes hours or days that doesn't help so much.

Before NM restart, the NM would try to kill all active containers on shutdown to prevent this.
 With restart this is undesireable _unless_ the NM is going down and isn't going to be started
in a timely manner (i.e.: this isn't a upgrade or NM isn't being supervised).

> Distinguish shutdown during supervision vs. shutdown for rolling upgrade
> ------------------------------------------------------------------------
>
>                 Key: YARN-2331
>                 URL: https://issues.apache.org/jira/browse/YARN-2331
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-2331.patch, YARN-2331v2.patch, YARN-2331v3.patch
>
>
> When the NM is shutting down with restart support enabled there are scenarios we'd like
to distinguish and behave accordingly:
> # The NM is running under supervision.  In that case containers should be preserved so
the automatic restart can recover them.
> # The NM is not running under supervision and a rolling upgrade is not being performed.
 In that case the shutdown should kill all containers since it is unlikely the NM will be
restarted in a timely manner to recover them.
> # The NM is not running under supervision and a rolling upgrade is being performed. 
In that case the shutdown should not kill all containers since a restart is imminent due to
the rolling upgrade and the containers will be recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message