hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
Date Tue, 22 Jul 2014 15:16:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070372#comment-14070372
] 

Jason Lowe commented on YARN-2331:
----------------------------------

We can distinguish between supervised/unsupervised via a config.  Determining whether an unsupervised
shutdown is due to a rolling upgrade is a bit trickier.  Some of the options there include:

- Add an admin port to NMs and a corresponding CLI command to send commands to the port. 
There's a lot of boilerplate that goes along with this, but it is the most flexible option
if we ever want to add other admin commands to an NM.
- Add a REST API to do this (with appropriate authentication to make sure not just anyone
can cause an NM shutdown)
- Use another signal handler to indicate the shutdown just like the SIGTERM handler today
for a normal shutdown but for another signal like SIGINT.   The shell scripts could have a
new command that would perform the rolling upgrade shutdown with the new signal rather than
SIGTERM.  This would be relatively simple to implement on POSIX platforms like Linux but has
portability ramifications for non-POSIX platforms like Windows.

> Distinguish shutdown during supervision vs. shutdown for rolling upgrade
> ------------------------------------------------------------------------
>
>                 Key: YARN-2331
>                 URL: https://issues.apache.org/jira/browse/YARN-2331
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>
> When the NM is shutting down with restart support enabled there are scenarios we'd like
to distinguish and behave accordingly:
> # The NM is running under supervision.  In that case containers should be preserved so
the automatic restart can recover them.
> # The NM is not running under supervision and a rolling upgrade is not being performed.
 In that case the shutdown should kill all containers since it is unlikely the NM will be
restarted in a timely manner to recover them.
> # The NM is not running under supervision and a rolling upgrade is being performed. 
In that case the shutdown should not kill all containers since a restart is imminent due to
the rolling upgrade and the containers will be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message