flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-9908) Inconsistent state of SlotPool after ExecutionGraph cancellation
Date Sun, 22 Jul 2018 18:14:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Till Rohrmann updated FLINK-9908:
---------------------------------
    Component/s: Distributed Coordination

> Inconsistent state of SlotPool after ExecutionGraph cancellation 
> -----------------------------------------------------------------
>
>                 Key: FLINK-9908
>                 URL: https://issues.apache.org/jira/browse/FLINK-9908
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.5.1, 1.6.0, 1.7.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.5.2, 1.6.0, 1.7.0
>
>
> If the {{ExecutionGraph}} is concurrently scheduled and cancelled, it can happen that
requested {{Slots}} are not properly returned to the {{SlotPool}}. This causes an inconsistent
state of the {{SlotPool}} where it thinks that some of its slots are still occupied even though
the respective {{Execution}} has already been cancelled.
> The problem seems to be caused by propagating the cancellation of the overall scheduling
future to the individual scheduling futures. If the individual scheduling future is cancelled,
then the callback which produces its value and also handles the failure case won't be called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message