hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Graves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-128) Resurrect RM Restart
Date Tue, 25 Sep 2012 01:04:08 GMT

    [ https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462324#comment-13462324
] 

Thomas Graves commented on YARN-128:
------------------------------------

{quote}
RM sends commands back to clean up containers/applications. Can orphans be left behind on
nodes after RM restart? Will NM be able to auto-clean containers?
{quote}

Containers can currently be lost. See YARN-72 and YARN-73. Once its changed so RM doesn't
always reboot the NM's that will get a bit better but its still possible so we will have to
handle somehow.  Since the NM could crash it almost needs a way to check on startup whats
running and at that point decide if it should clean them up. It does have a .pid file for
the containers but you would have to be sure that process is the same one as when the NM went
down.
                
> Resurrect RM Restart 
> ---------------------
>
>                 Key: YARN-128
>                 URL: https://issues.apache.org/jira/browse/YARN-128
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM refactor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message