flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ufuk Celebi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-3539) Sync running Execution and Task instances via heartbeats
Date Mon, 29 Feb 2016 14:45:18 GMT
Ufuk Celebi created FLINK-3539:

             Summary: Sync running Execution and Task instances via heartbeats
                 Key: FLINK-3539
                 URL: https://issues.apache.org/jira/browse/FLINK-3539
             Project: Flink
          Issue Type: Improvement
          Components: Distributed Runtime
            Reporter: Ufuk Celebi

[~StephanEwen] pointed out that it is possible for the job manager and task manager state
to get out of sync. If for example a cancel message from the job manager to the task manager
is not delivered, the Execution will be failed at the job manager, but the task will keep
on running at the task manager.

A simple way to prevent such situations is the following:
- The task manager and job manager heartbeats add information about currently running tasks/executions
- If a task manager reports a task, which is not a running Execution, that task is cancelled
- If a job manager reports a running execution, which is not a running task, the execution
is failed

This message was sent by Atlassian JIRA

View raw message