From user-return-28446-archive-asf-public=cust-asf.ponee.io@flink.apache.org  Tue Jul  9 02:23:54 2019
Return-Path: <user-return-28446-archive-asf-public=cust-asf.ponee.io@flink.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id E3243180665
	for <archive-asf-public@cust-asf.ponee.io>; Tue,  9 Jul 2019 04:23:53 +0200 (CEST)
Received: (qmail 2700 invoked by uid 500); 9 Jul 2019 02:23:51 -0000
Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@flink.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@flink.apache.org>
List-Post: <mailto:user@flink.apache.org>
List-Id: <user.flink.apache.org>
Delivered-To: mailing list user@flink.apache.org
Received: (qmail 2690 invoked by uid 99); 9 Jul 2019 02:23:51 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Jul 2019 02:23:51 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id B7CBDC04A4
	for <user@flink.apache.org>; Tue,  9 Jul 2019 02:23:45 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 3.338
X-Spam-Level: ***
X-Spam-Status: No, score=3.338 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, PDS_NO_HELO_DNS=1.327,
	RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
	T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001, URIBL_CSS=0.1,
	URIBL_CSS_A=0.1] autolearn=disabled
Authentication-Results: spamd1-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-ec2-va.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id QSJ6xvcCQnsJ for <user@flink.apache.org>;
	Tue,  9 Jul 2019 02:23:43 +0000 (UTC)
Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.167.44; helo=mail-lf1-f44.google.com; envelope-from=walterddr@gmail.com; receiver=<UNKNOWN> 
Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44])
	by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 3A326BC52B
	for <user@flink.apache.org>; Tue,  9 Jul 2019 02:23:43 +0000 (UTC)
Received: by mail-lf1-f44.google.com with SMTP id z15so12242222lfh.13
        for <user@flink.apache.org>; Mon, 08 Jul 2019 19:23:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=YRA6OJRhCRX0jbbZ0YrzwXvcaagq3gIzUYMv0P/f9o0=;
        b=tOVFjTtKmEnUIGwYH8vYacqz5RMOne5/xV4lepmLwhGJGIEafmzNh4pBNV3UAsDnZQ
         NoK9S6xG5NVld3gXhO1Uyce1rLWw1uRyZgbU9FwOJFNCrwu/21+N9R8VgG4Q0jkAoYy9
         43BrgQWKqnyEiHO+POteJqWyFADuH9KM2j9DREei9x0javL62kTC2QYjC+owQsCsv0fm
         ETzM6BKPaibChjHIOqBr802h8g+LrYVcgXDApnc8Z29t6Edg98zLcYApMn+sbrwSSibG
         XANkEW9jfT4EYD+WPVh00hFkeVsY6sp/4CQ6aRD1Scmd7O1goDp/zznj9iOjKgL/7PNG
         o5UA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=YRA6OJRhCRX0jbbZ0YrzwXvcaagq3gIzUYMv0P/f9o0=;
        b=IX8q35b/2Lf/Q7EedYJbiYQVKpSbJ50T6GW4hSmg06uV5/F/c63h1vkYy/t0AEUY/8
         tYDhvBm5xerjky6mFC2Oe330YoykvRKaZHQDd7WYWu8d0wgzde/EWDfDou4WMN/1hky1
         cCBsycLtIJv64pttbuA1P66TnzB8Z2GjDzTyvvrdHz8gj6zX1rW3umL2eyApq7CYpwtN
         vN/ddQptovkXb6YVKoMquA03mRJZFevn/D03ZNjfA12ClJ7mfM9W3tCzaw7ge0s3evR1
         fKTJ0KHY3SmjdXyBTqEUnonCMJRMk2a1ORh6uW9eSyiHc9tod7SxSHpVUog7+PdsyFSM
         13Fg==
X-Gm-Message-State: APjAAAVQqFb3bsHS3Anu76ybUnpTbNz8FWYM7cYDrebnKCofUEpakrle
	Es6MH3j2TnIKakHh4Bd6GwIuG96JqIAHWxbRwDA=
X-Google-Smtp-Source: APXvYqzQwgX/id2T6/Py6LQNnDVwH6j9vbkZOR8MC0luVZxbKneZflCPbYCAZ4zFZGQAOE9/0nZAAY/MHpve2zGanCA=
X-Received: by 2002:a19:ae0d:: with SMTP id f13mr9980957lfc.123.1562639021983;
 Mon, 08 Jul 2019 19:23:41 -0700 (PDT)
MIME-Version: 1.0
References: <CAMiEuFReGRp5N=EbsgNX-M8BXQjHw50H=n=1cXKra-cg_+y4cw@mail.gmail.com>
 <CAAjxVPPhps_oq8dyOiFE-X0VzughiDFBLQT5zXcyc_1dN43V=g@mail.gmail.com> <CAMiEuFRkBW_oOEouXX_+uzx7ZUjJakhf4WBLhCCc9mh0++SaCA@mail.gmail.com>
In-Reply-To: <CAMiEuFRkBW_oOEouXX_+uzx7ZUjJakhf4WBLhCCc9mh0++SaCA@mail.gmail.com>
From: Rong Rong <walterddr@gmail.com>
Date: Mon, 8 Jul 2019 19:23:30 -0700
Message-ID: <CA+3UsY0BYt0+E8rLwkMpPhrUSx2g0Y6saKSd0-aU7JivFvJ_Rg@mail.gmail.com>
Subject: Re: How are kafka consumer offsets handled if sink fails?
To: John Smith <java.dev.mtl@gmail.com>
Cc: Konstantin Knauf <konstantin@ververica.com>, user <user@flink.apache.org>
Content-Type: multipart/alternative; boundary="000000000000d14fc8058d363f98"

--000000000000d14fc8058d363f98
Content-Type: text/plain; charset="UTF-8"

Hi John,

I think what Konstantin is trying to say is: Flink's Kafka consumer does
not start consuming from the Kafka commit offset when starting the
consumer, it would actually start with the offset that's last checkpointed
to external DFS. (e.g. the starting point of the consumer has no relevance
with Kafka committed offset whatsoever - if checkpoint is enabled.)

This is to quote:
"*the Flink Kafka Consumer does only commit offsets back to Kafka on a
best-effort basis after every checkpoint. Internally Flink "commits" the
[checkpoints]->[current Kafka offset] as part of its periodic checkpoints.*"

However if you do not enable checkpointing, I think your consumer will
by-default restart from the default kafka offset (which I think is your
committed group offset).

--
Rong


On Mon, Jul 8, 2019 at 6:39 AM John Smith <java.dev.mtl@gmail.com> wrote:

> So when we say a sink is at least once. It's because internally it's not
> checking any kind of state and it sends what it has regardless, correct?
> Cause I willl build a sink that calls stored procedures.
>
> On Sun., Jul. 7, 2019, 4:03 p.m. Konstantin Knauf, <
> konstantin@ververica.com> wrote:
>
>> Hi John,
>>
>> in case of a failure (e.g. in the SQL Sink) the Flink Job will be
>> restarted from the last checkpoint. This means the offset of all Kafka
>> partitions will be reset to that point in the stream along with state of
>> all operators. To enable checkpointing you need to call
>> StreamExecutionEnvironment#enableCheckpointing(). If you using the
>> JDBCSinkFunction (which is an at-least-once sink), the output will be
>> duplicated in the case of failures.
>>
>> To answer your questions:
>>
>> * For this the FlinkKafkaConsumer handles the offsets manually (no
>> auto-commit).
>> * No, the Flink Kafka Consumer does only commit offsets back to Kafka on
>> a best-effort basis after every checkpoint. Internally Flink "commits" the
>> checkpoints as part of its periodic checkpoints.
>> * Yes, along with all other events between the last checkpoint and the
>> failure.
>> * It will continue from the last checkpoint.
>>
>> Hope this helps.
>>
>> Cheers,
>>
>> Konstantin
>>
>> On Fri, Jul 5, 2019 at 8:37 PM John Smith <java.dev.mtl@gmail.com> wrote:
>>
>>> Hi using Apache Flink 1.8.0
>>>
>>> I'm consuming events from Kafka using nothing fancy...
>>>
>>> Properties props = new Properties();
>>> props.setProperty("bootstrap.servers", kafkaAddress);
>>> props.setProperty("group.id",kafkaGroup);
>>>
>>> FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<>(topic, new SimpleStringSchema(),props);
>>>
>>>
>>> Do some JSON transforms and then push to my SQL database using JDBC and
>>> stored procedure. Let's assume the SQL sink fails.
>>>
>>> We know that Kafka can either periodically commit offsets or it can be
>>> done manually based on consumers logic.
>>>
>>> - How is the source Kafka consumer offsets handled?
>>> - Does the Flink Kafka consumer commit the offset to per event/record?
>>> - Will that single event that failed be retried?
>>> - So if we had 5 incoming events and say on the 3rd one it failed, will
>>> it continue on the 3rd or will the job restart and try those 5 events.
>>>
>>>
>>>
>>
>> --
>>
>> Konstantin Knauf | Solutions Architect
>>
>> +49 160 91394525
>>
>>
>> Planned Absences: 10.08.2019 - 31.08.2019, 05.09. - 06.09.2010
>>
>>
>> --
>>
>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>
>> --
>>
>> Ververica GmbH
>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
>>
>

--000000000000d14fc8058d363f98
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi John,<div><br></div><div>I think what Konstantin is try=
ing to say is: Flink&#39;s Kafka consumer does not start consuming from the=
 Kafka commit offset when starting the consumer, it would actually start wi=
th the offset that&#39;s last checkpointed to external DFS. (e.g. the start=
ing point of the consumer has no relevance with Kafka committed offset what=
soever - if checkpoint is enabled.)</div><div><br></div><div>This is to quo=
te:=C2=A0</div><div>&quot;<i>the Flink Kafka Consumer does only commit offs=
ets back to Kafka on a best-effort basis after every checkpoint. Internally=
 Flink &quot;commits&quot; the <b>[checkpoints]-&gt;[current Kafka offset] =
</b>as part of its periodic checkpoints.</i>&quot;</div><div><br></div><div=
>However if you do not enable checkpointing, I think your consumer will by-=
default restart from the default kafka offset (which I think is your commit=
ted group offset).</div><div><br></div><div>--</div><div>Rong</div><div><br=
></div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail=
_attr">On Mon, Jul 8, 2019 at 6:39 AM John Smith &lt;<a href=3D"mailto:java=
.dev.mtl@gmail.com">java.dev.mtl@gmail.com</a>&gt; wrote:<br></div><blockqu=
ote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px=
 solid rgb(204,204,204);padding-left:1ex"><div dir=3D"auto">So when we say =
a sink is at least once. It&#39;s because internally it&#39;s not checking =
any kind of state and it sends what it has regardless, correct? Cause I wil=
ll build a sink that calls stored procedures.</div><br><div class=3D"gmail_=
quote"><div dir=3D"ltr" class=3D"gmail_attr">On Sun., Jul. 7, 2019, 4:03 p.=
m. Konstantin Knauf, &lt;<a href=3D"mailto:konstantin@ververica.com" target=
=3D"_blank">konstantin@ververica.com</a>&gt; wrote:<br></div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid=
 rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div>Hi John, <br></di=
v><div><br></div><div>in case of a failure (e.g. in the SQL Sink) the Flink=
 Job will be restarted from the last checkpoint. This means the offset of a=
ll Kafka partitions will be reset to that point in the stream along with st=
ate of all operators. To enable checkpointing you need to call StreamExecut=
ionEnvironment#enableCheckpointing(). If you using the JDBCSinkFunction (wh=
ich is an at-least-once sink), the output will be duplicated in the case of=
 failures.<br></div><div><br></div><div>To answer your questions:<br></div>=
<div></div><div><br></div><div>* For this the FlinkKafkaConsumer handles th=
e offsets manually (no auto-commit). <br></div><div>* No, the Flink Kafka C=
onsumer does only commit offsets back to Kafka on a best-effort basis after=
 every checkpoint. Internally Flink &quot;commits&quot; the checkpoints as =
part of its periodic checkpoints.</div><div>* Yes, along with all other eve=
nts between the last checkpoint and the failure.</div><div>* It will contin=
ue from the last checkpoint.</div><div><br></div><div>Hope this helps.</div=
><div><br></div><div>Cheers, <br></div><div><br></div><div>Konstantin<br></=
div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_at=
tr">On Fri, Jul 5, 2019 at 8:37 PM John Smith &lt;<a href=3D"mailto:java.de=
v.mtl@gmail.com" rel=3D"noreferrer" target=3D"_blank">java.dev.mtl@gmail.co=
m</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin=
:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"=
><div dir=3D"ltr">Hi using Apache Flink 1.8.0<br><br>I&#39;m consuming even=
ts from Kafka using nothing fancy...<div><br></div><div><pre style=3D"backg=
round-color:rgb(43,43,43);color:rgb(169,183,198);font-family:Menlo;font-siz=
e:9pt">Properties props =3D <span style=3D"color:rgb(204,120,50)">new </spa=
n>Properties()<span style=3D"color:rgb(204,120,50)">;<br></span>props.setPr=
operty(<span style=3D"color:rgb(106,135,89)">&quot;bootstrap.servers&quot;<=
/span><span style=3D"color:rgb(204,120,50)">, </span>kafkaAddress)<span sty=
le=3D"color:rgb(204,120,50)">;<br></span>props.setProperty(<span style=3D"c=
olor:rgb(106,135,89)">&quot;<a href=3D"http://group.id" rel=3D"noreferrer" =
target=3D"_blank">group.id</a>&quot;</span><span style=3D"color:rgb(204,120=
,50)">,</span>kafkaGroup)<span style=3D"color:rgb(204,120,50)">;<br></span>=
<span style=3D"color:rgb(204,120,50)"><br></span>FlinkKafkaConsumer&lt;Stri=
ng&gt; consumer =3D <span style=3D"color:rgb(204,120,50)">new </span>FlinkK=
afkaConsumer&lt;&gt;(topic<span style=3D"color:rgb(204,120,50)">, new </spa=
n>SimpleStringSchema()<span style=3D"color:rgb(204,120,50)">,</span>props)<=
span style=3D"color:rgb(204,120,50)">;<br></span></pre></div><div><br></div=
><div>Do some JSON transforms and then push to my SQL database using JDBC a=
nd stored procedure. Let&#39;s assume the SQL sink fails.<br><br>We know th=
at Kafka can either periodically commit offsets or it can be done manually =
based on consumers logic.</div><div><br></div><div>- How is the source Kafk=
a consumer offsets handled?<br></div><div>- Does the Flink Kafka consumer c=
ommit the offset to per event/record?<br></div><div>- Will that single even=
t that failed be retried?</div><div>- So if we had 5 incoming events and sa=
y on the 3rd one it failed, will it continue on the 3rd or will the job res=
tart and try those 5 events.</div><div><br></div><div><br></div></div>
</blockquote></div><br clear=3D"all"><br>-- <br><div dir=3D"ltr" class=3D"g=
mail-m_-6401545799529470680m_-6911737317177509371gmail_signature"><div dir=
=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr">=
<div><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><d=
iv dir=3D"ltr"><div dir=3D"ltr"><span><div dir=3D"ltr"><span><p dir=3D"ltr"=
 style=3D"line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style=3D=
"font-size:10pt;font-family:Arial;color:rgb(0,0,0);background-color:transpa=
rent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:=
none;vertical-align:baseline;white-space:pre-wrap">Konstantin Knauf</span><=
span style=3D"font-size:10pt;font-family:Roboto;color:rgb(0,0,0);background=
-color:transparent;font-weight:700;font-style:normal;font-variant:normal;te=
xt-decoration:none;vertical-align:baseline;white-space:pre-wrap"> </span><s=
pan style=3D"font-size:10pt;font-family:Roboto;color:rgb(0,0,0);background-=
color:transparent;font-weight:400;font-style:normal;font-variant:normal;tex=
t-decoration:none;vertical-align:baseline;white-space:pre-wrap">| Solutions=
 Architect</span></p><p dir=3D"ltr" style=3D"line-height:1.38;margin-top:0p=
t;margin-bottom:0pt"><span style=3D"font-size:10pt;font-family:Roboto;color=
:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;=
font-variant:normal;text-decoration:none;vertical-align:baseline;white-spac=
e:pre-wrap">+49 160 91394525</span></p><p dir=3D"ltr" style=3D"line-height:=
1.38;margin-top:0pt;margin-bottom:0pt"><br></p><p style=3D"line-height:1.38=
;margin-top:0pt;margin-bottom:0pt"><span style=3D"font-size:10pt;font-famil=
y:Roboto;color:rgb(0,0,0);background-color:transparent;font-weight:400;font=
-style:normal;font-variant:normal;text-decoration:none;vertical-align:basel=
ine;white-space:pre-wrap">Planned Absences: 10.08.2019 - 31.08.2019, 05.09.=
 - 06.09.2010<br></span></p><p dir=3D"ltr" style=3D"line-height:1.38;margin=
-top:0pt;margin-bottom:0pt"><br></p></span></div><div dir=3D"ltr"><p dir=3D=
"ltr" style=3D"line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span sty=
le=3D"background-color:transparent;color:rgb(0,0,0);font-family:Arial;font-=
size:9.5pt;white-space:pre-wrap">--</span><br></p><p dir=3D"ltr" style=3D"l=
ine-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style=3D"font-size:=
10pt;font-family:Roboto;color:rgb(0,0,0);vertical-align:baseline;white-spac=
e:pre-wrap">Ververica GmbH</span><span style=3D"font-size:10pt;font-family:=
Roboto;color:rgb(0,0,0);background-color:transparent;vertical-align:baselin=
e;white-space:pre-wrap"> | Invalidenstrasse 115, 10115 Berlin, Germany</spa=
n></p><p dir=3D"ltr" style=3D"line-height:1.38;margin-top:0pt;margin-bottom=
:0pt"><span style=3D"font-size:10pt;font-family:Roboto;color:rgb(0,0,0);bac=
kground-color:transparent;vertical-align:baseline;white-space:pre-wrap">--<=
/span></p><p dir=3D"ltr" style=3D"line-height:1.38;margin-top:0pt;margin-bo=
ttom:0pt"><span style=3D"font-size:10pt;font-family:Roboto;color:rgb(0,0,0)=
;vertical-align:baseline;white-space:pre-wrap">Ververica GmbH</span><span s=
tyle=3D"font-size:10pt;font-family:Roboto;color:rgb(0,0,0);background-color=
:transparent;vertical-align:baseline;white-space:pre-wrap"><br></span><span=
 style=3D"font-size:10pt;font-family:Roboto;color:rgb(0,0,0);background-col=
or:transparent;vertical-align:baseline;white-space:pre-wrap">Registered at =
Amtsgericht Charlottenburg: HRB 158244 B</span><span style=3D"font-size:10p=
t;font-family:Roboto;color:rgb(0,0,0);background-color:transparent;vertical=
-align:baseline;white-space:pre-wrap"><br></span><span style=3D"font-size:1=
0pt;font-family:Roboto;color:rgb(0,0,0);background-color:transparent;vertic=
al-align:baseline;white-space:pre-wrap">Managing Directors: Dr. Kostas Tzou=
mas, Dr. Stephan Ewen</span><br></p></div></span></div></div></div></div></=
div></div></div></div></div></div></div></div></div></div></div>
</blockquote></div>
</blockquote></div>

--000000000000d14fc8058d363f98--