flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Richter <s.rich...@data-artisans.com>
Subject Re: Why would a kafka source checkpoint take so long?
Date Wed, 12 Jul 2017 10:13:00 GMT
Hi,

could you introduce some logging to figure out from which method call the delay is introduced?

Best,
Stefan

> Am 12.07.2017 um 11:37 schrieb Gyula Fóra <gyula.fora@gmail.com>:
> 
> Hi,
> 
> We are using the latest 1.3.1
> 
> Gyula
> 
> Urs Schoenenberger <urs.schoenenberger@tngtech.com <mailto:urs.schoenenberger@tngtech.com>>
ezt írta (időpont: 2017. júl. 12., Sze, 10:44):
> Hi Gyula,
> 
> I don't know the cause unfortunately, but we observed a similiar issue
> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
> Which version are you running on?
> 
> Urs
> 
> On 12.07.2017 09:48, Gyula Fóra wrote:
> > Hi,
> >
> > I have noticed a strange behavior in one of our jobs: every once in a while
> > the Kafka source checkpointing time becomes extremely large compared to
> > what it usually is. (To be very specific it is a kafka source chained with
> > a stateless map operator)
> >
> > To be more specific checkpointing the offsets usually takes around 10ms
> > which sounds reasonable but in some checkpoints this goes into the 3-5
> > minutes range practically blocking the job for that period of time.
> > Yesterday I have observed even 10 minute delays. First I thought that some
> > sources might trigger checkpoints later than others, but adding some
> > logging and comparing it it seems that the triggerCheckpoint was received
> > at the same time.
> >
> > Interestingly only one of the 3 kafka sources in the job seems to be
> > affected (last time I checked at least). We are still using the 0.8
> > consumer with commit on checkpoints. Also I dont see this happen in other
> > jobs.
> >
> > Any clue on what might cause this?
> >
> > Thanks :)
> > Gyula
> >
> >
> >
> > Hi,
> >
> > I have noticed a strange behavior in one of our jobs: every once in a
> > while the Kafka source checkpointing time becomes extremely large
> > compared to what it usually is. (To be very specific it is a kafka
> > source chained with a stateless map operator)
> >
> > To be more specific checkpointing the offsets usually takes around 10ms
> > which sounds reasonable but in some checkpoints this goes into the 3-5
> > minutes range practically blocking the job for that period of time.
> > Yesterday I have observed even 10 minute delays. First I thought that
> > some sources might trigger checkpoints later than others, but adding
> > some logging and comparing it it seems that the triggerCheckpoint was
> > received at the same time.
> >
> > Interestingly only one of the 3 kafka sources in the job seems to be
> > affected (last time I checked at least). We are still using the 0.8
> > consumer with commit on checkpoints. Also I dont see this happen in
> > other jobs.
> >
> > Any clue on what might cause this?
> >
> > Thanks :)
> > Gyula
> 
> --
> Urs Schönenberger - urs.schoenenberger@tngtech.com <mailto:urs.schoenenberger@tngtech.com>
> 
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082


Mime
View raw message