apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandesh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXCORE-426) Support work preserving AM recovery
Date Thu, 02 Mar 2017 02:14:45 GMT

    [ https://issues.apache.org/jira/browse/APEXCORE-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891488#comment-15891488
] 

Sandesh commented on APEXCORE-426:
----------------------------------

Adding an answer to the question asked by [~PramodSSImmaneni]

Q:
Just for completeness can you explain on the JIRA what happens if there were plan changes
that were saved but before stram could affect them it got shutdown. Will the new instance
make those changes.

A:
When a Stram recovers, It will use the last checkpointed plan, if the running containers are
any different from the expected plan, following things will happen
1. Containers unknown to Stram are killed/rejected
2. Containers which are not responding will be rescheduled. 


> Support work preserving AM recovery
> -----------------------------------
>
>                 Key: APEXCORE-426
>                 URL: https://issues.apache.org/jira/browse/APEXCORE-426
>             Project: Apache Apex Core
>          Issue Type: Improvement
>            Reporter: Thomas Weise
>            Assignee: Sandesh
>              Labels: apex-hadoop-version
>
> On app master failure, the streaming containers should continue running. 
> As of 2.2, YARN will automatically terminate all containers and the replacement app master
will relaunch them. Once we move to a newer minimum Hadoop version, we should leverage work
preserving restart.
> The mechanism in Apex containers to locate the new master process are already in place.
>  
> Test Cases:
> 1. Kill the app-master - only app-master container id should change, all the other containers
id should remain same.
> 2. Kill the app-master and few other containers, make sure that killed containers are
recovered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message