flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5962) Cancel checkpoint canceller tasks in CheckpointCoordinator
Date Mon, 06 Mar 2017 13:58:32 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897347#comment-15897347

Stephan Ewen commented on FLINK-5962:

I have a pretty big change to the {{PendingCheckpoint}} and {{CheckpointCoordinator}} coming
up, which should go in first, lest we completely redo the timer patch anyways.

I think the fix for this issue is actually very small, it simply means adding the cancellation
timer to the {{PendingCheckpoint}} and cancelling it when disposing the pending checkpoint.

My change will only go into {{master}}, so creating a patch for the {{release-1.2}} branch
should be fine.

> Cancel checkpoint canceller tasks in CheckpointCoordinator
> ----------------------------------------------------------
>                 Key: FLINK-5962
>                 URL: https://issues.apache.org/jira/browse/FLINK-5962
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.2.0, 1.3.0
>            Reporter: Till Rohrmann
>            Priority: Critical
> The {{CheckpointCoordinator}} register a canceller task for each running checkpoint.
The canceller task's responsibility is to cancel a checkpoint if it takes too long to complete.
We should cancel this task as soon as the checkpoint has been completed, because otherwise
we will keep many canceller tasks around. This can eventually lead to an OOM exception.

This message was sent by Atlassian JIRA

View raw message