flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6833) Race condition: Asynchronous checkpointing task can fail completed StreamTask
Date Thu, 08 Jun 2017 08:34:20 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042408#comment-16042408

ASF GitHub Bot commented on FLINK-6833:

Github user StefanRRichter commented on the issue:

    LGTM +1

> Race condition: Asynchronous checkpointing task can fail completed StreamTask
> -----------------------------------------------------------------------------
>                 Key: FLINK-6833
>                 URL: https://issues.apache.org/jira/browse/FLINK-6833
>             Project: Flink
>          Issue Type: Bug
>          Components: Local Runtime, State Backends, Checkpointing
>    Affects Versions: 1.3.0, 1.4.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Critical
> A {{StreamTask}} which is about to finish and thus transitioning its containing {{Task}}
into the {{ExecutionState.FINISHED}} state, can be failed by a concurrent asynchronous checkpointing
operation. The problem is that upon termination the {{StreamTask}} cancels all concurrent
operations (amongst others ongoing asynchronous checkpoints). The cancellation of the async
checkpoint triggers the {{StreamTask#handleAsyncException}} call which will fail the containing
{{Task}}. If the {{handleAsyncException}} completes before the {{StreamTask}} has been properly
terminated, then the containing {{Task}} will transition into {{ExecutionState.FAILED}} instead
of {{ExecutionState.FINISHED}}.
> In order to resolve this race condition, we should check in the {{StreamTask#handleAsyncException}}
whether the {{StreamTask}} is still running or has already been terminated. Only in the former
case, we should fail the containing {{Task}}.

This message was sent by Atlassian JIRA

View raw message