hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1815) RM should recover only Managed AMs
Date Thu, 13 Mar 2014 17:59:48 GMT

    [ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933639#comment-13933639

Jian He commented on YARN-1815:

Thanks Karthik for the patch.
For now, it should be fine to move UMA to Failed state as UMA is not saving the final state
and RM restart doesn’t support UMA. The core change looks good.

Test case:  we need a more thorough test case to test UMA is moved to Failed state after RM
restarts using two MockRMs like the ones in TestRMRestart. The bigger problem is that if Unmanged
application is not added back to the completedApps in RMAppManager after RM restart via the
FinalTransition, it'll never be removed from state store. We remove the applications from
state store when completedApps in RMAppManager go beyond the max-app-limit.

> RM should recover only Managed AMs
> ----------------------------------
>                 Key: YARN-1815
>                 URL: https://issues.apache.org/jira/browse/YARN-1815
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>            Priority: Critical
>         Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, yarn-1815-2.patch,
> RM should not recover unmanaged AMs until YARN-1823 is fixed. 

This message was sent by Atlassian JIRA

View raw message