Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
References: <CAE8Z6Hy-QA0EhGB6Bp4ohCeDrSqyo05kkpArhFbkVLsVTg6wkw@mail.gmail.com>
 <3898522B-CB90-4469-A422-0181A7D4C1D5@data-artisans.com>
In-Reply-To: <3898522B-CB90-4469-A422-0181A7D4C1D5@data-artisans.com>
From: Antoine Philippot <antoine.philippot@teads.tv>
Date: Mon, 02 Oct 2017 15:30:53 +0000
Message-ID: <CAE8Z6Hwrep2ATi77j6f4C9DedkS5Za07wKrHDy_AYFeVNhOBiA@mail.gmail.com>
Subject: Re: Avoid duplicate messages while restarting a job for an
 application upgrade
To: Piotr Nowojski <piotr@data-artisans.com>
Cc: user@flink.apache.org
Content-Type: multipart/alternative; boundary="089e08e4b65d01bd10055a92104b"
archived-at: Mon, 02 Oct 2017 15:31:17 -0000

--089e08e4b65d01bd10055a92104b
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Thanks Piotr for your answer, we sadly can't use kafka 0.11 for now (and
until a while).

We can not afford tens of thousands of duplicated messages for each
application upgrade, can I help by working on this feature ?
Do you have any hint or details on this part of that "todo list" ?


Le lun. 2 oct. 2017 =C3=A0 16:50, Piotr Nowojski <piotr@data-artisans.com> =
a
=C3=A9crit :

> Hi,
>
> For failures recovery with Kafka 0.9 it is not possible to avoid
> duplicated messages. Using Flink 1.4 (unreleased yet) combined with Kafka
> 0.11 it will be possible to achieve exactly-once end to end semantic when
> writing to Kafka. However this still a work in progress:
>
> https://issues.apache.org/jira/browse/FLINK-6988
>
> However this is a superset of functionality that you are asking for.
> Exactly-once just for clean shutdowns is also on our =E2=80=9CTODO=E2=80=
=9D list (it
> would/could support Kafka 0.9), but it is not currently being actively
> developed.
>
> Piotr Nowojski
>
> On Oct 2, 2017, at 3:35 PM, Antoine Philippot <antoine.philippot@teads.tv=
>
> wrote:
>
> Hi,
>
> I'm working on a flink streaming app with a kafka09 to kafka09 use case
> which handles around 100k messages per seconds.
>
> To upgrade our application we used to run a flink cancel with savepoint
> command followed by a flink run with the previous saved savepoint and the
> new application fat jar as parameter. We notice that we can have more tha=
n
> 50k of duplicated messages in the kafka sink wich is not idempotent.
>
> This behaviour is actually problematic for this project and I try to find
> a solution / workaround to avoid these duplicated messages.
>
> The JobManager indicates clearly that the cancel call is triggered once
> the savepoint is finished, but during the savepoint execution, kafka sour=
ce
> continue to poll new messages which will not be part of the savepoint and
> will be replayed on the next application start.
>
> I try to find a solution with the stop command line argument but the kafk=
a
> source doesn't implement StoppableFunction (
> https://issues.apache.org/jira/browse/FLINK-3404) and the savepoint
> generation is not available with stop in contrary to cancel.
>
> Is there an other solution to not process duplicated messages for each
> application upgrade or rescaling ?
>
> If no, has someone planned to implement it? Otherwise, I can propose a
> pull request after some architecture advices.
>
> The final goal is to stop polling source and trigger a savepoint once
> polling stopped.
>
> Thanks
>
>
>

--089e08e4b65d01bd10055a92104b
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Thanks Piotr for your answer, we sadly can&#39;t use =
kafka 0.11 for now (and until a while).</div><div><br></div><div>We can not=
 afford tens of thousands of duplicated messages for each application upgra=
de, can I help by working on this feature ?</div><div>Do you have any hint =
or details on this part of that &quot;todo list&quot; ?=C2=A0</div><div>=C2=
=A0</div><br><div class=3D"gmail_quote"><div dir=3D"ltr">Le=C2=A0lun. 2 oct=
. 2017 =C3=A0=C2=A016:50, Piotr Nowojski &lt;<a href=3D"mailto:piotr@data-a=
rtisans.com" target=3D"_blank">piotr@data-artisans.com</a>&gt; a =C3=A9crit=
=C2=A0:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex"><div style=3D"word-wrap:bre=
ak-word"><div>Hi,</div><div><br></div><div>For failures recovery with Kafka=
 0.9 it is not possible to avoid duplicated messages. Using Flink 1.4 (unre=
leased yet) combined with Kafka 0.11 it will be possible to achieve exactly=
-once end to end semantic when writing to Kafka. However this still a work =
in progress:</div><div><br></div><div><a href=3D"https://issues.apache.org/=
jira/browse/FLINK-6988" target=3D"_blank">https://issues.apache.org/jira/br=
owse/FLINK-6988</a></div><div><br></div><div>However this is a superset of =
functionality that you are asking for. Exactly-once just for clean shutdown=
s is also on our =E2=80=9CTODO=E2=80=9D list (it would/could support Kafka =
0.9), but it is not currently being actively developed.</div></div><div sty=
le=3D"word-wrap:break-word"><div><br></div><div>Piotr Nowojski</div></div><=
div style=3D"word-wrap:break-word"><br><div><blockquote type=3D"cite"><div>=
On Oct 2, 2017, at 3:35 PM, Antoine Philippot &lt;<a href=3D"mailto:antoine=
.philippot@teads.tv" target=3D"_blank">antoine.philippot@teads.tv</a>&gt; w=
rote:</div><br class=3D"m_-5750227324244728277m_9115109700573527731Apple-in=
terchange-newline"><div><div dir=3D"ltr"><div>Hi,</div><div><br></div><div>=
I&#39;m working on a flink streaming app with a kafka09 to kafka09 use case=
 which handles around 100k messages per seconds.</div><div><br></div><div>T=
o upgrade our application we used to run a flink cancel with savepoint comm=
and followed by a flink run with the previous saved savepoint and the new a=
pplication fat jar as parameter. We notice that we can have more than 50k o=
f duplicated messages in the kafka sink wich is not idempotent.</div><div><=
br></div><div>This behaviour is actually problematic for this project and I=
 try to find a solution / workaround to avoid these duplicated messages.</d=
iv><div><br></div><div>The JobManager indicates clearly that the cancel cal=
l is triggered once the savepoint is finished, but during the savepoint exe=
cution, kafka source continue to poll new messages which will not be part o=
f the savepoint and will be replayed on the next application start.</div><d=
iv><br></div><div>I try to find a solution with the stop command line argum=
ent but the kafka source doesn&#39;t implement StoppableFunction (<a href=
=3D"https://issues.apache.org/jira/browse/FLINK-3404" target=3D"_blank">htt=
ps://issues.apache.org/jira/browse/FLINK-3404</a>) and the savepoint genera=
tion is not available with stop in contrary to cancel.</div><div><br></div>=
<div>Is there an other solution to not process duplicated messages for each=
 application upgrade or rescaling ?</div><div><br></div><div>If no, has som=
eone planned to implement it? Otherwise, I can propose a pull request after=
 some architecture advices.</div><div><br></div><div>The final goal is to s=
top polling source and trigger a savepoint once polling stopped.</div><div>=
<br></div><div>Thanks</div></div>
</div></blockquote></div><br></div></blockquote></div></div>

--089e08e4b65d01bd10055a92104b--