hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1354) Recover applications upon nodemanager restart
Date Wed, 23 Apr 2014 22:06:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978983#comment-13978983
] 

Ming Ma commented on YARN-1354:
-------------------------------

Yeah, the "FINISH_APP message lost" can be addressed by having NM ask RM the ground truth.
Or can NM get the information from the local running container instances? Either way, it should
address the scenario where NM crash and comes back couple days later with stale apps and other
state info.

nmStore.start() is called from NodeManager's serviceInit; should it be called from NodeManager's
serviceStart or add nmStore via addService; that will also make sure NodeManager's serviceStop
calls nmStore.stop().

> Recover applications upon nodemanager restart
> ---------------------------------------------
>
>                 Key: YARN-1354
>                 URL: https://issues.apache.org/jira/browse/YARN-1354
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1354-v1.patch
>
>
> The set of active applications in the nodemanager context need to be recovered for work-preserving
nodemanager restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message