flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyula.f...@gmail.com>
Subject Re: Why would a kafka source checkpoint take so long?
Date Wed, 12 Jul 2017 10:23:01 GMT
Yes thats definitely what I am about to do next but just thought maybe
someone has seen this before.

Will post info next time it happens. (Not guaranteed to happen soon as it
didn't happen for a long time before)

Gyula

On Wed, Jul 12, 2017, 12:13 Stefan Richter <s.richter@data-artisans.com>
wrote:

> Hi,
>
> could you introduce some logging to figure out from which method call the
> delay is introduced?
>
> Best,
> Stefan
>
> Am 12.07.2017 um 11:37 schrieb Gyula Fóra <gyula.fora@gmail.com>:
>
> Hi,
>
> We are using the latest 1.3.1
>
> Gyula
>
> Urs Schoenenberger <urs.schoenenberger@tngtech.com> ezt írta (időpont:
> 2017. júl. 12., Sze, 10:44):
>
>> Hi Gyula,
>>
>> I don't know the cause unfortunately, but we observed a similiar issue
>> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
>> Which version are you running on?
>>
>> Urs
>>
>> On 12.07.2017 09:48, Gyula Fóra wrote:
>> > Hi,
>> >
>> > I have noticed a strange behavior in one of our jobs: every once in a
>> while
>> > the Kafka source checkpointing time becomes extremely large compared to
>> > what it usually is. (To be very specific it is a kafka source chained
>> with
>> > a stateless map operator)
>> >
>> > To be more specific checkpointing the offsets usually takes around 10ms
>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>> > minutes range practically blocking the job for that period of time.
>> > Yesterday I have observed even 10 minute delays. First I thought that
>> some
>> > sources might trigger checkpoints later than others, but adding some
>> > logging and comparing it it seems that the triggerCheckpoint was
>> received
>> > at the same time.
>> >
>> > Interestingly only one of the 3 kafka sources in the job seems to be
>> > affected (last time I checked at least). We are still using the 0.8
>> > consumer with commit on checkpoints. Also I dont see this happen in
>> other
>> > jobs.
>> >
>> > Any clue on what might cause this?
>> >
>> > Thanks :)
>> > Gyula
>> >
>> >
>> >
>> > Hi,
>> >
>> > I have noticed a strange behavior in one of our jobs: every once in a
>> > while the Kafka source checkpointing time becomes extremely large
>> > compared to what it usually is. (To be very specific it is a kafka
>> > source chained with a stateless map operator)
>> >
>> > To be more specific checkpointing the offsets usually takes around 10ms
>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>> > minutes range practically blocking the job for that period of time.
>> > Yesterday I have observed even 10 minute delays. First I thought that
>> > some sources might trigger checkpoints later than others, but adding
>> > some logging and comparing it it seems that the triggerCheckpoint was
>> > received at the same time.
>> >
>> > Interestingly only one of the 3 kafka sources in the job seems to be
>> > affected (last time I checked at least). We are still using the 0.8
>> > consumer with commit on checkpoints. Also I dont see this happen in
>> > other jobs.
>> >
>> > Any clue on what might cause this?
>> >
>> > Thanks :)
>> > Gyula
>>
>> --
>> Urs Schönenberger - urs.schoenenberger@tngtech.com
>>
>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>
>
>

Mime
View raw message