flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-1953) Rework Checkpoint Coordinator
Date Tue, 28 Apr 2015 12:49:06 GMT
Stephan Ewen created FLINK-1953:

             Summary: Rework Checkpoint Coordinator
                 Key: FLINK-1953
                 URL: https://issues.apache.org/jira/browse/FLINK-1953
             Project: Flink
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 0.9
            Reporter: Stephan Ewen
            Assignee: Stephan Ewen
             Fix For: 0.9

The checkpoint coordinator currently contains no tests and is vulnerable to a variety of situations.
In particular, I propose to add:

 - Better configurability which tasks receive the trigger checkpoint messages, which tasks
need to acknowledge the checkpoint, and which tasks need to receive confirmation messages.

 - checkpoint timeouts, such that incomplete checkpoints are guaranteed to be cleaned up after
a while, regardless of successful checkpoints

 - better sanity checking of messages and fields, to properly handle/ignore messages for old/expired
checkpoints, or invalidly routed messages

 - Better handling of checkpoint attempts at points where the execution has just failed is
is currently being canceled.

 - Add a good set of tests

This message was sent by Atlassian JIRA

View raw message