flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-7067) Cancel with savepoint does not restart checkpoint scheduler on failure
Date Mon, 23 Oct 2017 12:46:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215075#comment-16215075
] 

ASF GitHub Bot commented on FLINK-7067:
---------------------------------------

GitHub user uce opened a pull request:

    https://github.com/apache/flink/pull/4888

    [backport] [FLINK-7067] Resume checkpointing after failed cancel-job-with-savepoint

    This is a backport of #4254. I will merge this as soon as Travis gives the green light.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uce/flink 7067-backport

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4888.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4888
    
----
commit 9226c3a15f8037851110fbdecf775cad99da771f
Author: Ufuk Celebi <uce@apache.org>
Date:   2017-07-04T14:39:02Z

    [hotfix] [tests] Reduce visibility of helper class methods
    
    There is no need to make the helper methods public. No other class
    should even use this inner test helper invokable.

commit c571929ce476f17d02ee22df0b5170b0eb322c2d
Author: Ufuk Celebi <uce@apache.org>
Date:   2017-07-04T15:01:32Z

    [FLINK-7067] [jobmanager] Resume periodic checkpoints after failed cancel-job-with-savepoint
    
    Problem: If a cancel-job-with-savepoint request fails, this has an
    unintended side effect on the respective job if it has periodic
    checkpoints enabled. The periodic checkpoint scheduler is stopped
    before triggering the savepoint, but not restarted if a savepoint
    fails and the job is not cancelled.
    
    This commit makes sure that the periodic checkpoint scheduler is
    restarted iff periodic checkpoints were enabled before.
    
    This closes #4254.

commit 074630a2fbd6dbdc7ff775ee9fb5d46c088dbc6d
Author: Ufuk Celebi <uce@apache.org>
Date:   2017-10-23T12:42:46Z

    [FLINK-7067] [jobmanager] Backport to 1.3

----


> Cancel with savepoint does not restart checkpoint scheduler on failure
> ----------------------------------------------------------------------
>
>                 Key: FLINK-7067
>                 URL: https://issues.apache.org/jira/browse/FLINK-7067
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.3.1
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>            Priority: Blocker
>             Fix For: 1.4.0, 1.3.3
>
>
> The `CancelWithSavepoint` action of the JobManager first stops the checkpoint scheduler,
then triggers a savepoint, and cancels the job after the savepoint completes.
> If the savepoint fails, the command should not have any side effects and we don't cancel
the job. The issue is that the checkpoint scheduler is not restarted though.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message