hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-73) nodemanager should cleanup running containers when it starts
Date Tue, 25 Sep 2012 14:48:08 GMT

    [ https://issues.apache.org/jira/browse/YARN-73?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462801#comment-13462801

Robert Joseph Evans commented on YARN-73:

[~kihwal] also mentioned to me that we could do a best effort in a JVM shutdown hook, and
then have this as a backup for anything that we were not able to kill, which seems very reasonable.

> nodemanager should cleanup running containers when it starts
> ------------------------------------------------------------
>                 Key: YARN-73
>                 URL: https://issues.apache.org/jira/browse/YARN-73
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
> Currently the nodemanager doesn't cleanup running containers when it gets restarted.
This can cause containers to get lost and stick around forever. We've seen this happen multiple
times when the RM is restarted. When the RM is brought back up, it doesn't know about what
was running on the cluster, it tells the NMs to reboot and when the NM reboots it loses what
it had running. If there are any containers that are behaving badly there is no one left that
knows about them to kill them.
> We should kill any running containers when the nodemanager is being started.  Note that
when the NM is being brought up it needs to somehow figure out what containers were running
and be sure it doesn't kill anything it shouldn't.
> Note, we should also try to kill any running containers when the node manager is shutting
down (jira 4213 was filed for this).
> This might change a bit when RM restart is implemented if tasks can actually survive
across RM/NM being rebooted, but that can be addressed at that point.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message