hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Graves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-128) Resurrect RM Restart
Date Tue, 25 Sep 2012 01:04:08 GMT

    [ https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462324#comment-13462324

Thomas Graves commented on YARN-128:

RM sends commands back to clean up containers/applications. Can orphans be left behind on
nodes after RM restart? Will NM be able to auto-clean containers?

Containers can currently be lost. See YARN-72 and YARN-73. Once its changed so RM doesn't
always reboot the NM's that will get a bit better but its still possible so we will have to
handle somehow.  Since the NM could crash it almost needs a way to check on startup whats
running and at that point decide if it should clean them up. It does have a .pid file for
the containers but you would have to be sure that process is the same one as when the NM went
> Resurrect RM Restart 
> ---------------------
>                 Key: YARN-128
>                 URL: https://issues.apache.org/jira/browse/YARN-128
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt
> We should resurrect 'RM Restart' which we disabled sometime during the RM refactor.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message