flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-7783) Don't always remove checkpoints in ZooKeeperCompletedCheckpointStore#recover()
Date Mon, 23 Oct 2017 03:32:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214586#comment-16214586

ASF GitHub Bot commented on FLINK-7783:

Github user StefanRRichter commented on the issue:

    I think overall this fix looks very good now. This is implementing a different strategy
to deal with problematic checkpoints that can recover from more scenarios than the initial
PR which was discussed between @aljoscha and me offline, so I prefer this approach.
    I only had minor comments and one talking point about potential concurrent modifications.
If this turns out to be an non-issue (as I expect), I would approve this for merging.

> Don't always remove checkpoints in ZooKeeperCompletedCheckpointStore#recover()
> ------------------------------------------------------------------------------
>                 Key: FLINK-7783
>                 URL: https://issues.apache.org/jira/browse/FLINK-7783
>             Project: Flink
>          Issue Type: Sub-task
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.4.0, 1.3.2
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>            Priority: Blocker
>             Fix For: 1.4.0, 1.3.3
> Currently, we always delete checkpoint handles if they (or the data from the DFS) cannot
be read: https://github.com/apache/flink/blob/91a4b276171afb760bfff9ccf30593e648e91dfb/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L180
> This can lead to problems in case the DFS is temporarily not available, i.e. we could
> delete all checkpoints even though they are still valid.
> A user reported this problem on the mailing list: https://lists.apache.org/thread.html/9dc9b719cf8449067ad01114fedb75d1beac7b4dff171acdcc24903d@%3Cuser.flink.apache.org%3E

This message was sent by Atlassian JIRA

View raw message