flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-7783) Don't always remove checkpoints in ZooKeeperCompletedCheckpointStore#recover()
Date Sun, 22 Oct 2017 09:44:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214255#comment-16214255

ASF GitHub Bot commented on FLINK-7783:

GitHub user aljoscha opened a pull request:


    [FLINK-7783] Don't always remove checkpoints in ZooKeeperCompletedCheckpointStore#recover()

    I think this will be the final version for what I started in #4863.
    Now, the code will retrieve checkpoints and succeed if either all of them area read or
of two successive tries read the same set of checkpoints.
    This doesn't duplicate the test anymore but still leaves the questionable (lack of) separation
of concerns in the store.
    R: @StefanRRichter, @tillrohrmann 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aljoscha/flink jira-7783-zookeeper-state-store-fix-simplified3

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4879
commit ca189b4c44810229331332e397523cba5417b4d6
Author: Aljoscha Krettek <aljoscha.krettek@gmail.com>
Date:   2017-10-22T09:40:43Z

    [FLINK-7783] Don't always remove checkpoints in ZooKeeperCompletedCheckpointStore#recover()


> Don't always remove checkpoints in ZooKeeperCompletedCheckpointStore#recover()
> ------------------------------------------------------------------------------
>                 Key: FLINK-7783
>                 URL: https://issues.apache.org/jira/browse/FLINK-7783
>             Project: Flink
>          Issue Type: Sub-task
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.4.0, 1.3.2
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>            Priority: Blocker
>             Fix For: 1.4.0, 1.3.3
> Currently, we always delete checkpoint handles if they (or the data from the DFS) cannot
be read: https://github.com/apache/flink/blob/91a4b276171afb760bfff9ccf30593e648e91dfb/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L180
> This can lead to problems in case the DFS is temporarily not available, i.e. we could
> delete all checkpoints even though they are still valid.
> A user reported this problem on the mailing list: https://lists.apache.org/thread.html/9dc9b719cf8449067ad01114fedb75d1beac7b4dff171acdcc24903d@%3Cuser.flink.apache.org%3E

This message was sent by Atlassian JIRA

View raw message