Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: <1475553079759-9300.post@n4.nabble.com>
References: <CAGq0LH1MwvjDK8aBGmDDq6HkX11OEb-aNLs8=varq6XXUd3zPA@mail.gmail.com>
 <CAC27z=OHNaGxTiHCs628R_qBUaG5GNGihBrw8gvS_iZ2KvYgLw@mail.gmail.com>
 <CAGq0LH0bqnnShhu-Fae3ed+_Grog2i5YWCUqrks1sFx+Z=pCqA@mail.gmail.com>
 <CANC1h_soCrGHVH2u2gNqQ+7a=Ew0oC-OAJQxwNv49BAgMT0iwQ@mail.gmail.com>
 <CAGq0LH3sVF-VPbHOtvPLXtLzyzjhF_ySnEDYKamtabDx6-F=pQ@mail.gmail.com>
 <CANC1h_ucdSZhKWxhAbipEGUYPy1Ds=xXXAGjnC=sQacu+TcAXQ@mail.gmail.com>
 <1470680686177-8375.post@n4.nabble.com> <CAGr9p8A1i59NHiHzg6fBOWnVQL0+f8TQSw0i2rkbO9byuthkjA@mail.gmail.com>
 <1475553079759-9300.post@n4.nabble.com>
From: Stephan Ewen <sewen@apache.org>
Date: Thu, 6 Oct 2016 17:30:08 +0200
Message-ID: <CANC1h_ss3tPwu+UKd7Xu2=iCvz7y0c4xSwfLvFJ-q8s-rmGxGw@mail.gmail.com>
Subject: Re: Flink Kafka Consumer Behaviour
To: user@flink.apache.org
Content-Type: multipart/alternative; boundary=001a113f6ade044786053e33f821
archived-at: Thu, 06 Oct 2016 15:30:12 -0000

--001a113f6ade044786053e33f821
Content-Type: text/plain; charset=UTF-8

Hi!

There was an issue in the Kafka 0.9 consumer in Flink concerning
checkpoints. It was relevant mostly for lower-throughput topics /
partitions.

It is fixed in the 1.1.3 release. Can you try out the release candidate and
see if that solves your problem?
See here for details on the release candidate:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-Apache-Flink-1-1-3-RC1-td13860.html

To test this, set the dependency for the flink-connector-kafka-09 to
"1.1.3" and add the staging repository described in the above link to your
pom.xml.

Thanks,
Stephan


On Tue, Oct 4, 2016 at 5:51 AM, ankitcha <ankitchaudhary123@gmail.com>
wrote:

> Hi Prabhu, cc Stephan, Robert,
>
> I was having similar issues where flink Kafka 09 consumer was not
> committing
> offsets to kafka. After digging into JobManager logs, I found that
> checkpoints were getting expired before getting completed and hence
> "checkpoint completed" message was being ignored.
>
> I increased the checkpoint interval from default 10 mins to 30 mins to
> verify, and then checkpoints were getting finished way before timeout (~12
> mins), and then consumer was correctly updating offsets in kafka.
>
> This seems to be working for us at the moment, and also note this scenarios
> normally happens at the start of the job and the consumer group already has
> some decent lag.
>
> So, you might wanna try increasing checkpoint timeouts and check if that
> solves the issue. You should look for following in the jobmanager logs
>
> [Checkpoint Timer] org.apache.flink.runtime.check
> point.CheckpointCoordinator
> - Checkpoint 37 expired before completing.
> [Checkpoint Timer] org.apache.flink.runtime.check
> point.CheckpointCoordinator
> - Triggering checkpoint 38 @ 1474313373634
> [Checkpoint Timer] org.apache.flink.runtime.check
> point.CheckpointCoordinator
> - Checkpoint 38 expired before completing.
> [Checkpoint Timer] org.apache.flink.runtime.check
> point.CheckpointCoordinator
> - Triggering checkpoint 39 @ 1474313973640
>
> --
> Ankit
>
>
>
> --
> View this message in context: http://apache-flink-user-maili
> ng-list-archive.2336050.n4.nabble.com/Flink-Kafka-Consume
> r-Behaviour-tp8257p9300.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>

--001a113f6ade044786053e33f821
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra">Hi!</div><div class=3D"gmail_ex=
tra"><br></div><div class=3D"gmail_extra">There was an issue in the Kafka 0=
.9 consumer in Flink concerning checkpoints. It was relevant mostly for low=
er-throughput topics / partitions.</div><div class=3D"gmail_extra"><br></di=
v><div class=3D"gmail_extra">It is fixed in the 1.1.3 release. Can you try =
out the release candidate and see if that solves your problem?=C2=A0</div><=
div class=3D"gmail_extra">See here for details on the release candidate: <a=
 href=3D"http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOT=
E-Release-Apache-Flink-1-1-3-RC1-td13860.html">http://apache-flink-mailing-=
list-archive.1008284.n3.nabble.com/VOTE-Release-Apache-Flink-1-1-3-RC1-td13=
860.html</a></div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_=
extra">To test this, set the dependency for the flink-connector-kafka-09 to=
 &quot;1.1.3&quot; and add the staging repository described in the above li=
nk to your pom.xml.</div><div class=3D"gmail_extra"><br></div><div class=3D=
"gmail_extra">Thanks,</div><div class=3D"gmail_extra">Stephan</div><div cla=
ss=3D"gmail_extra"><br></div><div class=3D"gmail_extra"><br><div class=3D"g=
mail_quote">On Tue, Oct 4, 2016 at 5:51 AM, ankitcha <span dir=3D"ltr">&lt;=
<a href=3D"mailto:ankitchaudhary123@gmail.com" target=3D"_blank">ankitchaud=
hary123@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote=
" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);=
padding-left:1ex">Hi Prabhu, cc Stephan, Robert,<br>
<br>
I was having similar issues where flink Kafka 09 consumer was not committin=
g<br>
offsets to kafka. After digging into JobManager logs, I found that<br>
checkpoints were getting expired before getting completed and hence<br>
&quot;checkpoint completed&quot; message was being ignored.<br>
<br>
I increased the checkpoint interval from default 10 mins to 30 mins to<br>
verify, and then checkpoints were getting finished way before timeout (~12<=
br>
mins), and then consumer was correctly updating offsets in kafka.<br>
<br>
This seems to be working for us at the moment, and also note this scenarios=
<br>
normally happens at the start of the job and the consumer group already has=
<br>
some decent lag.<br>
<br>
So, you might wanna try increasing checkpoint timeouts and check if that<br=
>
solves the issue. You should look for following in the jobmanager logs<br>
<br>
[Checkpoint Timer] org.apache.flink.runtime.check<wbr>point.CheckpointCoord=
inator<br>
- Checkpoint 37 expired before completing.<br>
[Checkpoint Timer] org.apache.flink.runtime.check<wbr>point.CheckpointCoord=
inator<br>
- Triggering checkpoint 38 @ 1474313373634<br>
[Checkpoint Timer] org.apache.flink.runtime.check<wbr>point.CheckpointCoord=
inator<br>
- Checkpoint 38 expired before completing.<br>
[Checkpoint Timer] org.apache.flink.runtime.check<wbr>point.CheckpointCoord=
inator<br>
- Triggering checkpoint 39 @ 1474313973640<br>
<span class=3D"gmail-m_4529166163870256120HOEnZb"><font color=3D"#888888"><=
br>
--<br>
Ankit<br>
<br>
<br>
<br>
--<br>
View this message in context: <a href=3D"http://apache-flink-user-mailing-l=
ist-archive.2336050.n4.nabble.com/Flink-Kafka-Consumer-Behaviour-tp8257p930=
0.html" rel=3D"noreferrer" target=3D"_blank">http://apache-flink-user-maili=
<wbr>ng-list-archive.2336050.n4.<wbr>nabble.com/Flink-Kafka-Consume<wbr>r-B=
ehaviour-tp8257p9300.html</a><br>
</font></span><div class=3D"gmail-m_4529166163870256120HOEnZb"><div class=
=3D"gmail-m_4529166163870256120h5">Sent from the Apache Flink User Mailing =
List archive. mailing list archive at Nabble.com.<br>
</div></div></blockquote></div><br></div></div>

--001a113f6ade044786053e33f821--