flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paris Carbone (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-3256) Invalid execution graph cleanup for jobs with colocation groups
Date Mon, 18 Jan 2016 18:52:39 GMT
Paris Carbone created FLINK-3256:
------------------------------------

             Summary: Invalid execution graph cleanup for jobs with colocation groups
                 Key: FLINK-3256
                 URL: https://issues.apache.org/jira/browse/FLINK-3256
             Project: Flink
          Issue Type: Bug
          Components: Distributed Runtime
            Reporter: Paris Carbone
            Assignee: Paris Carbone
            Priority: Blocker


Currently, upon restarting an execution graph, we clean-up the colocation constraints for
each group present in an ExecutionJobVertex respectively.

This can lead to invalid reconfiguration upon a restart or any other activity that relies
on state cleanup of the execution graph. For example, upon restarting a DataStream job with
iterations the following steps are executed:

1) IterationSource colocation group constraints are reset
2) New IterationSource colocation group constraints are generated
3) IterationSource subtasks are scheduled with current colocation constraints
4) IterationSink colocation group constraints are reset
5) New IterationSink colocation group constraints are generated
6) IterationSink subtasks are scheduled with different colocation constraints, thus, not being
colocated with sources while also demanding more slots from the scheduler.

This can be trivially fixed by reseting colocation groups independently from ExecutionJobVertices,
thus, updating them once per reconfiguration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message