flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (FLINK-9635) Local recovery scheduling can cause spread out of tasks
Date Wed, 31 Oct 2018 10:05:01 GMT

     [ https://issues.apache.org/jira/browse/FLINK-9635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Till Rohrmann closed FLINK-9635.
--------------------------------
      Resolution: Fixed
    Release Note: With the improvements to Flink's scheduling, it can no longer happen that
recoveries require more slots than before if local recovery is enabled. Consequently, we encourage
our users to use the local recovery feature which can be enabled by `state.backend.local-recovery:
true`.

> Local recovery scheduling can cause spread out of tasks
> -------------------------------------------------------
>
>                 Key: FLINK-9635
>                 URL: https://issues.apache.org/jira/browse/FLINK-9635
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.5.0, 1.6.2
>            Reporter: Till Rohrmann
>            Assignee: Stefan Richter
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>
> In order to make local recovery work, Flink's scheduling was changed such that it tries
to be rescheduled to its previous location. In order to not occupy slots which have state
of other tasks cached, the strategy will request a new slot if the old slot identified by
the previous allocation id is no longer present. This also applies to newly allocated slots
because there is no distinction between new or already used. This behaviour can cause that
every tasks gets deployed to its own slot if the {{SlotPool}} has released all slots in the
meantime, for example. The consequence could be that a job can no longer be executed after
a failure because it needs more slots than before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message