From user-return-17707-archive-asf-public=cust-asf.ponee.io@flink.apache.org  Fri Jan 19 00:26:50 2018
Return-Path: <user-return-17707-archive-asf-public=cust-asf.ponee.io@flink.apache.org>
X-Original-To: archive-asf-public@eu.ponee.io
Delivered-To: archive-asf-public@eu.ponee.io
Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183])
	by mx-eu-01.ponee.io (Postfix) with ESMTP id 75F3D180654
	for <archive-asf-public@eu.ponee.io>; Fri, 19 Jan 2018 00:26:50 +0100 (CET)
Received: by cust-asf.ponee.io (Postfix)
	id 658AD160C48; Thu, 18 Jan 2018 23:26:50 +0000 (UTC)
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by cust-asf.ponee.io (Postfix) with SMTP id 8604E160C26
	for <archive-asf-public@cust-asf.ponee.io>; Fri, 19 Jan 2018 00:26:49 +0100 (CET)
Received: (qmail 68102 invoked by uid 500); 18 Jan 2018 23:26:48 -0000
Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@flink.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@flink.apache.org>
List-Post: <mailto:user@flink.apache.org>
List-Id: <user.flink.apache.org>
Delivered-To: mailing list user@flink.apache.org
Received: (qmail 68092 invoked by uid 99); 18 Jan 2018 23:26:48 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jan 2018 23:26:48 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id BDB591800FE
	for <user@flink.apache.org>; Thu, 18 Jan 2018 23:26:47 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.88
X-Spam-Level: *
X-Spam-Status: No, score=1.88 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01,
	RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001]
	autolearn=disabled
Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id vtoV_gkDGDBQ for <user@flink.apache.org>;
	Thu, 18 Jan 2018 23:26:46 +0000 (UTC)
Received: from mail-pg0-f45.google.com (mail-pg0-f45.google.com [74.125.83.45])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id AFD5D5F343
	for <user@flink.apache.org>; Thu, 18 Jan 2018 23:26:45 +0000 (UTC)
Received: by mail-pg0-f45.google.com with SMTP id g16so8990608pgn.7
        for <user@flink.apache.org>; Thu, 18 Jan 2018 15:26:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:from:date:message-id:subject:to;
        bh=DdIfQvMo5/w8bwG/fW9fYInjn1iOkUQuB4WPNFzBuwk=;
        b=TmM73jJgN7clQCJ4KVk31sdpQFTMFM1BoOBP6L7g/AwRexcmGnUehIJY8GCoswhMbw
         eTIZMqWVUa0FCg9jnSS+xteNRlxw6sO6USVKMr4jjxzKyPvP54mnf2rspmLH+x4lk6SH
         cs7QWuQpFldAVRU6QlF0cbh7TtNcTK0z53uJeMTDzwH+pCA1FtKeCIpxcZRqXO88KNzh
         wJo9d4P8FHLAQEpAvD+U7lbX4N8YYhMpDHy/b6+5vRQdMtIdOWNeXDsRr80koV5Kj0j3
         b65heX0UY6NknAuoXdu/bPrfAddHOVxwnROHwkMMi0WQBV1RleBXJAjaL9Tn+S047cvd
         GYIQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
        bh=DdIfQvMo5/w8bwG/fW9fYInjn1iOkUQuB4WPNFzBuwk=;
        b=Mx9nNReaF/X20trGlT4qTfA1IBbo/UpL/3XoSWdiJ/0oFuLIUV2zEwfm3gZQh/B5LL
         msQHktYe4DfUxqQQPkjov9vZdMi1/DAfvK2gjpAyHy6VdbIWcFwp3QwSnrkVa3CZbaPP
         HTSFLmWjDfbw8zPnd+whqgyABJqCbyu/VDNoZwS7YbUhviMY3AxoFH8Es2+m5Oqq3Km3
         qYpbNGmRj3pRmK3iCUkB7+Kb/ZC5dGPTF8+fjY0ot3iqtinaq8Clqb5nOcliZe8DOCKF
         UGkCH+vYjfWSKKE3taLnr5f6sA1bVa/QDOkHh62pcrLzeOVgfympwb5CNQxDckP3aWw5
         wQVw==
X-Gm-Message-State: AKwxytc/d9O9z86dakoqP+gKE/gMI3SVtOUSKlns7T0/Mr2pv4GQejCs
	zJs+XTKkq3LjpCBFZcuTYwAJvKXLmS/Xj42zojGcHg==
X-Google-Smtp-Source: ACJfBot4V9PlIP2YZJkaNuaoh5FchXetU4LCWjSQpciFIaDuvy/jrzwI4iReTFLlyPYL01u8KCbYo2ldyqZTymI/2no=
X-Received: by 10.101.101.26 with SMTP id x26mr13461361pgv.149.1516318003273;
 Thu, 18 Jan 2018 15:26:43 -0800 (PST)
MIME-Version: 1.0
Received: by 10.100.128.67 with HTTP; Thu, 18 Jan 2018 15:26:42 -0800 (PST)
From: jelmer <jkuperus@gmail.com>
Date: Fri, 19 Jan 2018 00:26:42 +0100
Message-ID: <CADLuPxFX11E-+A+RmCMBfFk-VRGaRVQ-p9NkGLqvPfCdbCWO2w@mail.gmail.com>
Subject: Starting a job that does not use checkpointing from a savepoint is
 broken ?
To: user <user@flink.apache.org>
Content-Type: multipart/alternative; boundary="089e082c226cf3a4950563154b57"

--089e082c226cf3a4950563154b57
Content-Type: text/plain; charset="UTF-8"

I ran into a rather annoying issue today while upgrading a  flink jobs from
flink 1.3.2 to 1.4.0

This particular job does not use checkpointing not state.

I followed the instructions at
https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/upgrading.html

First created a savepoint, upgraded the cluster, then restarted the job
from the savepoint.

This all went well until later a few hours later one of our kafka nodes
dies.This triggered an exception in the job which was subsequently
restarted.

However instead of picking up where it left off based on the offsets
comitted to kafka (which is what should happen according to
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html)
the kafka offsets where reset to the point when i made the savepoint 3
hours earlier and so it started reprocessing millions of messages.

Needless to say that creating a savepoint for a job without state or
checkpoints does not make that much sense. But I would not expect a restart
from a savepoint to completely break a job in the case of failure.

I created a repository that reproduces the scenario I encountered

https://github.com/jelmerk/flink-cancel-restart-job-without-checkpointing

Am I misunderstanding anything or should i file a bug for this ?

--089e082c226cf3a4950563154b57
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>I ran into a rather annoying issue today while upgrad=
ing a=C2=A0 flink jobs from flink 1.3.2 to 1.4.0<br></div><div><br></div><d=
iv>This particular job does not use checkpointing not state.</div><div><br>=
</div><div>I followed the instructions at <a href=3D"https://ci.apache.org/=
projects/flink/flink-docs-release-1.4/ops/upgrading.html">https://ci.apache=
.org/projects/flink/flink-docs-release-1.4/ops/upgrading.html</a></div><div=
><br></div><div>First created a savepoint, upgraded the cluster, then resta=
rted the job from the savepoint.</div><div><br></div><div>This all went wel=
l until later a few hours later one of our kafka nodes dies.This triggered =
an exception in the job which was subsequently restarted.</div><div><br></d=
iv><div>However instead of picking up where it left off based on the offset=
s comitted to kafka (which is what should happen according to=C2=A0<a href=
=3D"https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connect=
ors/kafka.html">https://ci.apache.org/projects/flink/flink-docs-release-1.4=
/dev/connectors/kafka.html</a>)=C2=A0 the kafka offsets where reset to the =
point when i made the savepoint 3 hours earlier and so it started reprocess=
ing millions of messages.</div><div><br></div><div>Needless to say that cre=
ating a savepoint for a job without state or checkpoints does not make that=
 much sense. But I would not expect a restart from a savepoint to completel=
y break a job in the case of failure.</div><div><br></div><div>I created a =
repository that reproduces the scenario I encountered</div><div><br></div><=
div><a href=3D"https://github.com/jelmerk/flink-cancel-restart-job-without-=
checkpointing">https://github.com/jelmerk/flink-cancel-restart-job-without-=
checkpointing</a><br></div><div><br></div><div>Am I misunderstanding anythi=
ng or should i file a bug for this ?</div><div><br></div><div><br></div></d=
iv>

--089e082c226cf3a4950563154b57--