flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Why would a kafka source checkpoint take so long?
Date Wed, 12 Jul 2017 13:27:12 GMT
Can it be that the checkpoint thread is waiting to grab the lock, which is
held by the chain under backpressure?

On Wed, Jul 12, 2017 at 12:23 PM, Gyula Fóra <gyula.fora@gmail.com> wrote:

> Yes thats definitely what I am about to do next but just thought maybe
> someone has seen this before.
>
> Will post info next time it happens. (Not guaranteed to happen soon as it
> didn't happen for a long time before)
>
> Gyula
>
> On Wed, Jul 12, 2017, 12:13 Stefan Richter <s.richter@data-artisans.com>
> wrote:
>
>> Hi,
>>
>> could you introduce some logging to figure out from which method call the
>> delay is introduced?
>>
>> Best,
>> Stefan
>>
>> Am 12.07.2017 um 11:37 schrieb Gyula Fóra <gyula.fora@gmail.com>:
>>
>> Hi,
>>
>> We are using the latest 1.3.1
>>
>> Gyula
>>
>> Urs Schoenenberger <urs.schoenenberger@tngtech.com> ezt írta (időpont:
>> 2017. júl. 12., Sze, 10:44):
>>
>>> Hi Gyula,
>>>
>>> I don't know the cause unfortunately, but we observed a similiar issue
>>> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
>>> Which version are you running on?
>>>
>>> Urs
>>>
>>> On 12.07.2017 09:48, Gyula Fóra wrote:
>>> > Hi,
>>> >
>>> > I have noticed a strange behavior in one of our jobs: every once in a
>>> while
>>> > the Kafka source checkpointing time becomes extremely large compared to
>>> > what it usually is. (To be very specific it is a kafka source chained
>>> with
>>> > a stateless map operator)
>>> >
>>> > To be more specific checkpointing the offsets usually takes around 10ms
>>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>>> > minutes range practically blocking the job for that period of time.
>>> > Yesterday I have observed even 10 minute delays. First I thought that
>>> some
>>> > sources might trigger checkpoints later than others, but adding some
>>> > logging and comparing it it seems that the triggerCheckpoint was
>>> received
>>> > at the same time.
>>> >
>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>> > affected (last time I checked at least). We are still using the 0.8
>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>> other
>>> > jobs.
>>> >
>>> > Any clue on what might cause this?
>>> >
>>> > Thanks :)
>>> > Gyula
>>> >
>>> >
>>> >
>>> > Hi,
>>> >
>>> > I have noticed a strange behavior in one of our jobs: every once in a
>>> > while the Kafka source checkpointing time becomes extremely large
>>> > compared to what it usually is. (To be very specific it is a kafka
>>> > source chained with a stateless map operator)
>>> >
>>> > To be more specific checkpointing the offsets usually takes around 10ms
>>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>>> > minutes range practically blocking the job for that period of time.
>>> > Yesterday I have observed even 10 minute delays. First I thought that
>>> > some sources might trigger checkpoints later than others, but adding
>>> > some logging and comparing it it seems that the triggerCheckpoint was
>>> > received at the same time.
>>> >
>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>> > affected (last time I checked at least). We are still using the 0.8
>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>> > other jobs.
>>> >
>>> > Any clue on what might cause this?
>>> >
>>> > Thanks :)
>>> > Gyula
>>>
>>> --
>>> Urs Schönenberger - urs.schoenenberger@tngtech.com
>>>
>>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>>
>>
>>

Mime
View raw message