flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-9635) Local recovery scheduling can cause spread out of tasks
Date Thu, 21 Jun 2018 08:07:00 GMT
Till Rohrmann created FLINK-9635:

             Summary: Local recovery scheduling can cause spread out of tasks
                 Key: FLINK-9635
                 URL: https://issues.apache.org/jira/browse/FLINK-9635
             Project: Flink
          Issue Type: Bug
          Components: Distributed Coordination
    Affects Versions: 1.5.0
            Reporter: Till Rohrmann
             Fix For: 1.6.0, 1.5.1

In order to make local recovery work, Flink's scheduling was changed such that it tries to
be rescheduled to its previous location. In order to not occupy slots which have state of
other tasks cached, the strategy will request a new slot if the old slot identified by the
previous allocation id is no longer present. This also applies to newly allocated slots because
there is no distinction between new or already used. This behaviour can cause that every tasks
gets deployed to its own slot if the {{SlotPool}} has released all slots in the meantime,
for example.

This message was sent by Atlassian JIRA

View raw message