Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2D2AC200C8B for ; Mon, 22 May 2017 16:55:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2B215160BD5; Mon, 22 May 2017 14:55:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 76ECD160BA5 for ; Mon, 22 May 2017 16:55:07 +0200 (CEST) Received: (qmail 92125 invoked by uid 500); 22 May 2017 14:55:06 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 92113 invoked by uid 99); 22 May 2017 14:55:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 May 2017 14:55:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 33C851AFAD6 for ; Mon, 22 May 2017 14:55:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id cKIwBxCM_IOF for ; Mon, 22 May 2017 14:55:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 36A0C5FC85 for ; Mon, 22 May 2017 14:55:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B12F5E073A for ; Mon, 22 May 2017 14:55:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 6AEF621B57 for ; Mon, 22 May 2017 14:55:04 +0000 (UTC) Date: Mon, 22 May 2017 14:55:04 +0000 (UTC) From: "Till Rohrmann (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-6328) Savepoints must not be counted as retained checkpoints MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 22 May 2017 14:55:08 -0000 [ https://issues.apache.org/jira/browse/FLINK-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019663#comment-16019663 ] Till Rohrmann commented on FLINK-6328: -------------------------------------- Given that the lifecycle of a savepoint is out of control of the {{CheckpointCoordinator}}, I think it is best to not add savepoints to the {{CompletedCheckpointStore}} and, thus, not considering them for job recovery. The reason for this is FLINK-4815, because otherwise a single broken/deleted savepoint will thwart Flink's whole recovery mechanism. Once FLINK-4815 has been added we might think again about re-adding savepoints to the {{CompletedCheckpointStore}} and, thus, allowing to recover from savepoints in case of failures. When doing so, we should, however, not count the savepoints for the number of retained checkpoints, because we cannot be sure that they still exist. > Savepoints must not be counted as retained checkpoints > ------------------------------------------------------ > > Key: FLINK-6328 > URL: https://issues.apache.org/jira/browse/FLINK-6328 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing > Affects Versions: 1.2.0, 1.3.0, 1.4.0 > Reporter: Stephan Ewen > Assignee: Till Rohrmann > Priority: Blocker > Fix For: 1.3.0, 1.2.2 > > > The Checkpoint Store retains the *n* latest checkpoints. > Savepoints are counted as well, meaning that for settings with 1 retained checkpoint, there are sometimes no retained checkpoints at all, only a savepoint. > That is dangerous, because savepoints must be assumed to disappear at any point in time - their lifecycle is out of control of the CheckpointCoordinator. -- This message was sent by Atlassian JIRA (v6.3.15#6346)