Hi Gyula,

I have observed similar issue with FlinkConsumer09 and 010 and posted it to the mailing list as well . This issue is not consistent, however whenever it happens it leads to checkpoints getting failed or taking a long time to complete.

Regards,
Vinay Patil

On Wed, Jul 12, 2017 at 7:00 PM, Gyula Fóra [via Apache Flink User Mailing List archive.] <[hidden email]> wrote:
I have added logging that will help determine this as well, next time this happens I will post the results. (Although there doesnt seem to be high backpressure)

Thanks for the tips,
Gyula

Stephan Ewen <[hidden email]> ezt írta (időpont: 2017. júl. 12., Sze, 15:27):
Can it be that the checkpoint thread is waiting to grab the lock, which is held by the chain under backpressure?

On Wed, Jul 12, 2017 at 12:23 PM, Gyula Fóra <[hidden email]> wrote:

Yes thats definitely what I am about to do next but just thought maybe someone has seen this before.

Will post info next time it happens. (Not guaranteed to happen soon as it didn't happen for a long time before)

Gyula


On Wed, Jul 12, 2017, 12:13 Stefan Richter <[hidden email]> wrote:
Hi,

could you introduce some logging to figure out from which method call the delay is introduced?

Best,
Stefan

Am 12.07.2017 um 11:37 schrieb Gyula Fóra <[hidden email]>:

Hi,

We are using the latest 1.3.1

Gyula

Urs Schoenenberger <[hidden email]> ezt írta (időpont: 2017. júl. 12., Sze, 10:44):
Hi Gyula,

I don't know the cause unfortunately, but we observed a similiar issue
on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
Which version are you running on?

Urs

On 12.07.2017 09:48, Gyula Fóra wrote:
> Hi,
>
> I have noticed a strange behavior in one of our jobs: every once in a while
> the Kafka source checkpointing time becomes extremely large compared to
> what it usually is. (To be very specific it is a kafka source chained with
> a stateless map operator)
>
> To be more specific checkpointing the offsets usually takes around 10ms
> which sounds reasonable but in some checkpoints this goes into the 3-5
> minutes range practically blocking the job for that period of time.
> Yesterday I have observed even 10 minute delays. First I thought that some
> sources might trigger checkpoints later than others, but adding some
> logging and comparing it it seems that the triggerCheckpoint was received
> at the same time.
>
> Interestingly only one of the 3 kafka sources in the job seems to be
> affected (last time I checked at least). We are still using the 0.8
> consumer with commit on checkpoints. Also I dont see this happen in other
> jobs.
>
> Any clue on what might cause this?
>
> Thanks :)
> Gyula
>
>
>
> Hi,
>
> I have noticed a strange behavior in one of our jobs: every once in a
> while the Kafka source checkpointing time becomes extremely large
> compared to what it usually is. (To be very specific it is a kafka
> source chained with a stateless map operator)
>
> To be more specific checkpointing the offsets usually takes around 10ms
> which sounds reasonable but in some checkpoints this goes into the 3-5
> minutes range practically blocking the job for that period of time.
> Yesterday I have observed even 10 minute delays. First I thought that
> some sources might trigger checkpoints later than others, but adding
> some logging and comparing it it seems that the triggerCheckpoint was
> received at the same time.
>
> Interestingly only one of the 3 kafka sources in the job seems to be
> affected (last time I checked at least). We are still using the 0.8
> consumer with commit on checkpoints. Also I dont see this happen in
> other jobs.
>
> Any clue on what might cause this?
>
> Thanks :)
> Gyula

--
Urs Schönenberger - [hidden email]

TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082





If you reply to this email, your message will be added to the discussion below:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Why-would-a-kafka-source-checkpoint-take-so-long-tp14193p14210.html
To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML



View this message in context: Re: Why would a kafka source checkpoint take so long?
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.