flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-3187) Decouple restart strategy from ExecutionGraph
Date Fri, 18 Dec 2015 17:31:46 GMT
Till Rohrmann created FLINK-3187:

             Summary: Decouple restart strategy from ExecutionGraph
                 Key: FLINK-3187
                 URL: https://issues.apache.org/jira/browse/FLINK-3187
             Project: Flink
          Issue Type: Improvement
    Affects Versions: 1.0.0
            Reporter: Till Rohrmann
            Assignee: Till Rohrmann
            Priority: Minor

Currently, the {{ExecutionGraph}} supports the following restart logic: Whenever a failure
occurs and the number of restart attempts aren't depleted, wait for a fixed amount of time
and then try to restart. This behaviour can be controlled by the configuration parameters
{{execution-retries.default}} and {{execution-retries.delay}}.

I propose to decouple the restart logic from the {{ExecutionGraph}} a bit by introducing a
strategy pattern. That way it would not only allow us to define a job specific restart behaviour
but also to implement different restart strategies. Conceivable strategies could be: Fixed
timeout restart, exponential backoff restart, partial topology restarts, etc.

This change is a preliminary step towards having a restart strategy which will scale the parallelism
of a job down in case that not enough slots are available.

This message was sent by Atlassian JIRA

View raw message