flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-4818) RestartStrategy should track how many failed restore attempts the same checkpoint has and fall back to earlier checkpoints
Date Wed, 12 Oct 2016 19:20:20 GMT
Stephan Ewen created FLINK-4818:
-----------------------------------

             Summary: RestartStrategy should track how many failed restore attempts the same
checkpoint has and fall back to earlier checkpoints
                 Key: FLINK-4818
                 URL: https://issues.apache.org/jira/browse/FLINK-4818
             Project: Flink
          Issue Type: Sub-task
          Components: Distributed Coordination
            Reporter: Stephan Ewen


The restart strategies can use the exception information from FLINK-4816 to keep track of
how often a checkpoint restore has failed. After a certain number of consecutive failures,
they should take earlier completed checkpoints as recovery points.

It is up to discussion whether the restart strategies are the right place to implement that,
or whether this is an orthogonal feature that should go into the checkpoint coordinator (which
knows how many checkpoints are available) or a separate class altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message