flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Urs Schoenenberger <urs.schoenenber...@tngtech.com>
Subject Re: Why would a kafka source checkpoint take so long?
Date Wed, 12 Jul 2017 08:44:21 GMT
Hi Gyula,

I don't know the cause unfortunately, but we observed a similiar issue
on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
Which version are you running on?

Urs

On 12.07.2017 09:48, Gyula Fóra wrote:
> Hi,
> 
> I have noticed a strange behavior in one of our jobs: every once in a while
> the Kafka source checkpointing time becomes extremely large compared to
> what it usually is. (To be very specific it is a kafka source chained with
> a stateless map operator)
> 
> To be more specific checkpointing the offsets usually takes around 10ms
> which sounds reasonable but in some checkpoints this goes into the 3-5
> minutes range practically blocking the job for that period of time.
> Yesterday I have observed even 10 minute delays. First I thought that some
> sources might trigger checkpoints later than others, but adding some
> logging and comparing it it seems that the triggerCheckpoint was received
> at the same time.
> 
> Interestingly only one of the 3 kafka sources in the job seems to be
> affected (last time I checked at least). We are still using the 0.8
> consumer with commit on checkpoints. Also I dont see this happen in other
> jobs.
> 
> Any clue on what might cause this?
> 
> Thanks :)
> Gyula
> 
> 
> 
> Hi,
> 
> I have noticed a strange behavior in one of our jobs: every once in a
> while the Kafka source checkpointing time becomes extremely large
> compared to what it usually is. (To be very specific it is a kafka
> source chained with a stateless map operator)
> 
> To be more specific checkpointing the offsets usually takes around 10ms
> which sounds reasonable but in some checkpoints this goes into the 3-5
> minutes range practically blocking the job for that period of time.
> Yesterday I have observed even 10 minute delays. First I thought that
> some sources might trigger checkpoints later than others, but adding
> some logging and comparing it it seems that the triggerCheckpoint was
> received at the same time.
> 
> Interestingly only one of the 3 kafka sources in the job seems to be
> affected (last time I checked at least). We are still using the 0.8
> consumer with commit on checkpoints. Also I dont see this happen in
> other jobs.
> 
> Any clue on what might cause this?
> 
> Thanks :)
> Gyula

-- 
Urs Schönenberger - urs.schoenenberger@tngtech.com

TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082

Mime
View raw message