hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subru Krishnan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-1815) Work preserving recovery of Unmanged AMs
Date Thu, 26 May 2016 23:55:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303192#comment-15303192
] 

Subru Krishnan edited comment on YARN-1815 at 5/26/16 11:54 PM:
----------------------------------------------------------------

I tested the following scenarios:
For failover:
  * Submit a unmanaged AM and failover RM, unmanaged AM is able to reregister and complete
successfully. Verified that all it’s containers are preserved and no work is lost.

For recording state:
  * A unmanaged AM that completes successfully and verified state is SUCCEEDED
  * A unmanaged AM that is killed during execution and verified state is KILLED
  * A unmanaged AM that fails (essentially times out) and verified state is FAILED


was (Author: subru):
I tested the following scenarios:
For failover:
  * Submit a unmanaged AM and failover RM, unmanaged AM is able to reregister and complete
successfully. Verified that all it’s containers are preserved and no work is lost.
For recording state:
  * A unmanaged AM that completes successfully and verified state is SUCCEEDED
  * A unmanaged AM that is killed during execution and verified state is KILLED
  * A unmanaged AM that fails (essentially times out) and verified state is FAILED

> Work preserving recovery of Unmanged AMs
> ----------------------------------------
>
>                 Key: YARN-1815
>                 URL: https://issues.apache.org/jira/browse/YARN-1815
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: Karthik Kambatla
>            Assignee: Subru Krishnan
>            Priority: Critical
>         Attachments: Unmanaged AM recovery.png, YARN-1815-v3.patch, yarn-1815-1.patch,
yarn-1815-2.patch, yarn-1815-2.patch
>
>
> Currently work preserving RM restart recovers unmanaged AMs but it has a couple of shortcomings
- all running containers are killed and completed unmanaged AMs are also recovered as we do
_not_ record final state for unmanaged AMs in the RM StateStore. This JIRA proposes to address
both the shortcomings so that work preserving unmanaged AM recovery works exactly like with
managed AMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message