flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4356) new JobManager HA
Date Thu, 29 Jun 2017 12:47:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068283#comment-16068283
] 

Till Rohrmann commented on FLINK-4356:
--------------------------------------

I think that is an issue for the dispatcher component/application master and not the JobManager
since it's the former's responsibility to restart failed JobManagers. Closing this issue therefore.

> new JobManager HA
> -----------------
>
>                 Key: FLINK-4356
>                 URL: https://issues.apache.org/jira/browse/FLINK-4356
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Cluster Management
>            Reporter: jingzhang
>
> 1. for standalone mode, LocalDispatcher watch JobMaster
> LocalDispatcher detect the failure of JobMaster,  recover jobGraph and Libraries from
persistent storage, spawn a new JobManager
> new JobMaster compete for leadership, save address to zookeeper storage
> new JobMaster registers at ResourceManager
> new JobMaster  recover Execution of its job (execution graph) from latest completed checkpoint
> 2. for yarn mode, YarnApplicationMasterRunner create a ProcessReaper of JobMaster
> ProcessReaper monitor JobMaster, kill JVM upon JobMaster termination
> Yarn will create a new AppMaster which contains a new JobManager, JobGraph and Libraries
are retrieved as startup artifacts
> new JobMaster compete for leadership, save address to zookeeper storage
> new JobMaster registers at ResourceManager
> new JobMaster  recover Execution of its job (execution graph) from latest completed checkpoint



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message