Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A911A20049E for ; Thu, 10 Aug 2017 11:46:49 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A764516AF81; Thu, 10 Aug 2017 09:46:49 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9E4EB16AF80 for ; Thu, 10 Aug 2017 11:46:48 +0200 (CEST) Received: (qmail 99655 invoked by uid 500); 10 Aug 2017 09:46:47 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 99645 invoked by uid 99); 10 Aug 2017 09:46:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Aug 2017 09:46:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 19067C40F0 for ; Thu, 10 Aug 2017 09:46:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.901 X-Spam-Level: X-Spam-Status: No, score=-0.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id YuxfdRagD78P for ; Thu, 10 Aug 2017 09:46:44 +0000 (UTC) Received: from mail-wr0-f178.google.com (mail-wr0-f178.google.com [209.85.128.178]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id CF9E85F3CE for ; Thu, 10 Aug 2017 09:46:43 +0000 (UTC) Received: by mail-wr0-f178.google.com with SMTP id f21so914069wrf.5 for ; Thu, 10 Aug 2017 02:46:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=JqJfdENbD1FrSMZ6OHBTPPXrWR60rYW/G8Ks1fznxNA=; b=V4aGLr3piFQs7RuKsUeLPRSQJF5+CPjAPel2xP5nXabiP8r1/UXeB5cI7GbGb3yMwr NZy1PvCglR2Mvwsz6FYrPfftZi4hqRDiaHKZHic8Gn4AbMcZV9bvqwb2n+o7Eln1Y/1J Yha+Amarw2+BGnMGYZHbcZQgnQlbV0kr8Zz/2F2xp3+K6DMsNL4WIWmNKkFnPt7ha8Xh O6695PR82XIBp01WYNAo7TESgWvHnCxtK+1SRCZFupTHJSSzo+9jTA5IlzTLxnm3Mf8Q zWJiIPWPGuwR9wjUqbf3k3HACWeke169v7sMVf0E9is6X50FCQeY7xAkSNSzz5v0j7vZ q2hQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=JqJfdENbD1FrSMZ6OHBTPPXrWR60rYW/G8Ks1fznxNA=; b=YqGNpf7SdbLTPqBAv/C/D6PiYZ/6U2fAj+CJkhhTl+ZKyDd3997INSrMocxCf3oHvE Js+4jCcbH93zpr75dNaQYswMmECIWcb+IwD0bsx3Zt6ZZJvYxYihx4kqgwpBTbmIxk25 y5UyM1cHgyicK3R/FtosdZTR3GzmlNVld0CMysm6hSfCAl/wsnXbypAFqGKRZRHJv1hw IDD207T2AjK+BjNJnzgPTRdVU/QWBwZ9vz59ZjsNtpUAqcqFDxgVxlqCf14H2y33jyn6 Qj6fYw6Urx8eDwlFoa9tJ1qVwJU6YTPb60mVZIZYldSIw0rCuerUVRwVfhjuAk88/vFV Kf+A== X-Gm-Message-State: AHYfb5gcXDaaXtyqjXMQoSYkmH1TQH0tXx1QbyTbADqpjx+CEPAzDUTp MonLxgCqa4e5jQRT/rk8jxrkzAsUDg== X-Received: by 10.223.135.249 with SMTP id c54mr9107249wrc.98.1502358397152; Thu, 10 Aug 2017 02:46:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.109.20 with HTTP; Thu, 10 Aug 2017 02:46:36 -0700 (PDT) In-Reply-To: References: <4801540E-8F06-46F8-820B-ED76DF9E1AE5@target.com> From: Henri Heiskanen Date: Thu, 10 Aug 2017 12:46:36 +0300 Message-ID: Subject: Re: difference between checkpoints & savepoints To: Stefan Richter Cc: "Raja.Aravapalli" , "user@flink.apache.org" Content-Type: multipart/alternative; boundary="001a114747be96725c0556631298" archived-at: Thu, 10 Aug 2017 09:46:49 -0000 --001a114747be96725c0556631298 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, It would be super helpful if Flink would provide out of the box functionality for writing automatic savepoints and then starting from the latest savepoint. If external checkpoints would support rescaling then 1st requirement is met, but one would still need to e.g. find the latest checkpoint from some folder and pass that as argument. We are currently writing our own functionality for this. Why not just tell Flink that this job uses persistent states and default functionality is then to start from the latest snapshot. Br, Henri H On Thu, Aug 10, 2017 at 11:20 AM, Stefan Richter < s.richter@data-artisans.com> wrote: > Hi, > > I would explain the main conceptual difference as follows: > > - Checkpoints are periodically triggered by the system for fault > tolerance. They are used to automatically recover from failures. Because = of > their automatic and periodical nature, they should be lightweight to > produce and will restore the same job without any changes to the jobgraph= , > parallelism, etc. Checkpoints are usually dropped after the job was > terminated by the user. > > - Savepoints are triggered by the user to store the state of the job for = a > manual resume and backup. Savepoints are usually not periodical but > typically taken before some user actions to the job or the system. For > example, this could be an update of your Flink version, changing your job > graph, changing parallelism, forking a second job like for a red/blue > deployment, and so on. Of course, savepoints must survive job terminatio= n. > Conceptually, savepoints can be a bit more expensive to produce, because > they should have a format that makes all those =E2=80=9Echanges to the jo= b=E2=80=9C > features possible. > > Besides this conceptual difference, the current implementations are > basically using the same code and produce the same =E2=80=9Eformat". Howe= ver, there > is currently one exception from this, but I would expect more differences > in the future. This exception are incremental checkpoints with the RocksD= B > state backend. They are using some RocksDB internal format instead of > Flink=E2=80=99s =E2=80=9Esavepoint format=E2=80=9C. This makes them the f= irst instance of a more > lightweight checkpointing mechanism, compared to savepoints, at the cost = of > dropping support for certain features such as changing the parallelism. > > Furthermore, there also exists =E2=80=9Eexternalized checkpoints=E2=80=9C= , which are > somewhere in between checkpoints and savepoints. They are triggered by > Flink, but can survive job termination and can then be used by the user t= o > restart the job, similar to savepoints. They use the checkpointing code > path, so there are for example externalized incremental checkpoints. > However, exactly like a normal checkpoints, they might also lack certain > features like rescalability. > > Best, > Stefan > > Am 10.08.2017 um 05:32 schrieb Raja.Aravapalli >: > > Hi, > > Can someone please help me understand the difference between Flink's > Checkpoints & Savepoints. > > While I read the documentation, couldn't understand the difference! :s > > > Thanks a lot. > > > > Regards, > Raja. > > > --001a114747be96725c0556631298 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

It would be super helpful if Flink = would provide out of the box functionality for writing automatic savepoints= and then starting from the latest savepoint. If external checkpoints would= support rescaling then 1st requirement is met, but one would still need to= e.g. find the latest checkpoint from some folder and pass that as argument= . We are currently writing our own functionality for this. Why not just tel= l Flink that this job uses persistent states and default functionality is t= hen to start from the latest snapshot.

Br,
Henri H

On Thu, Aug 10, 2017 at 11:20 AM, Stefan Richter <= ;s.richter= @data-artisans.com> wrote:
=
Hi,

I would explain = the main conceptual difference as follows:

- Check= points are periodically triggered by the system for fault tolerance. They a= re used to automatically recover from failures. Because of their automatic = and periodical nature, they should be lightweight to produce and will resto= re the same job without any changes to the jobgraph, parallelism, etc. Chec= kpoints are usually dropped after the job was terminated by the user.
=

- Savepoints are triggered by the user to store the sta= te of the job for a manual resume and backup. Savepoints are usually not pe= riodical but typically taken before some user actions to the job or the sys= tem. For example, this could be an update of your Flink version, changing y= our job graph, changing parallelism, forking a second job like for a red/bl= ue deployment, and so on.=C2=A0 Of course, savepoints must survive job term= ination. Conceptually, savepoints can be a bit more expensive to produce, b= ecause they should have a format that makes all those =E2=80=9Echanges to t= he job=E2=80=9C features possible.

Besides this co= nceptual difference, the current implementations are basically using the sa= me code and produce the same =E2=80=9Eformat". However, there is curre= ntly one exception from this, but I would expect more differences in the fu= ture. This exception are incremental checkpoints with the RocksDB state bac= kend. They are using some RocksDB internal format instead of Flink=E2=80=99= s =E2=80=9Esavepoint format=E2=80=9C. This makes them the first instance of= a more lightweight checkpointing mechanism, compared to savepoints, at the= cost of dropping support for certain features such as changing the paralle= lism.

Furthermore, there also exists =E2=80=9Eexte= rnalized checkpoints=E2=80=9C, which are somewhere in between checkpoints a= nd savepoints. They are triggered by Flink, but can survive job termination= and can then be used by the user to restart the job, similar to savepoints= . They use the checkpointing code path, so there are for example externaliz= ed incremental checkpoints. However, exactly like a normal checkpoints, the= y might also lack certain features like rescalability.

=
Best,
Stefan

Am 10.08.2017 um 05:32 schrieb Raja.Aravapa= lli <Raj= a.Aravapalli@target.com>:

Hi,=
=C2=A0<= /div>
Can someone please help me understand th= e difference between Flink's Checkpoints & Savepoints.
=C2=A0
While I read the documentation, couldn'= t understand the difference! :s
=C2=A0
=C2=A0
Thanks a lo= t.=C2=A0<= u>
=C2=A0
=C2=A0
=C2=A0
Regards,
Raja.

--001a114747be96725c0556631298--