flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vinay patil <vinay18.pa...@gmail.com>
Subject Re: Why would a kafka source checkpoint take so long?
Date Wed, 12 Jul 2017 18:50:52 GMT
Hi Gyula,

I have observed similar issue with FlinkConsumer09 and 010 and posted it to
the mailing list as well . This issue is not consistent, however whenever
it happens it leads to checkpoints getting failed or taking a long time to
complete.

Regards,
Vinay Patil

On Wed, Jul 12, 2017 at 7:00 PM, Gyula Fóra [via Apache Flink User Mailing
List archive.] <ml+s2336050n14210h62@n4.nabble.com> wrote:

> I have added logging that will help determine this as well, next time this
> happens I will post the results. (Although there doesnt seem to be high
> backpressure)
>
> Thanks for the tips,
> Gyula
>
> Stephan Ewen <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=14210&i=0>> ezt írta (időpont:
> 2017. júl. 12., Sze, 15:27):
>
>> Can it be that the checkpoint thread is waiting to grab the lock, which
>> is held by the chain under backpressure?
>>
>> On Wed, Jul 12, 2017 at 12:23 PM, Gyula Fóra <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=14210&i=1>> wrote:
>>
>>> Yes thats definitely what I am about to do next but just thought maybe
>>> someone has seen this before.
>>>
>>> Will post info next time it happens. (Not guaranteed to happen soon as
>>> it didn't happen for a long time before)
>>>
>>> Gyula
>>>
>>> On Wed, Jul 12, 2017, 12:13 Stefan Richter <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=2>> wrote:
>>>
>>>> Hi,
>>>>
>>>> could you introduce some logging to figure out from which method call
>>>> the delay is introduced?
>>>>
>>>> Best,
>>>> Stefan
>>>>
>>>> Am 12.07.2017 um 11:37 schrieb Gyula Fóra <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=3>>:
>>>>
>>>> Hi,
>>>>
>>>> We are using the latest 1.3.1
>>>>
>>>> Gyula
>>>>
>>>> Urs Schoenenberger <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=4>> ezt
írta
>>>> (időpont: 2017. júl. 12., Sze, 10:44):
>>>>
>>>>> Hi Gyula,
>>>>>
>>>>> I don't know the cause unfortunately, but we observed a similiar issue
>>>>> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
>>>>> Which version are you running on?
>>>>>
>>>>> Urs
>>>>>
>>>>> On 12.07.2017 09:48, Gyula Fóra wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I have noticed a strange behavior in one of our jobs: every once
in
>>>>> a while
>>>>> > the Kafka source checkpointing time becomes extremely large compared
>>>>> to
>>>>> > what it usually is. (To be very specific it is a kafka source
>>>>> chained with
>>>>> > a stateless map operator)
>>>>> >
>>>>> > To be more specific checkpointing the offsets usually takes around
>>>>> 10ms
>>>>> > which sounds reasonable but in some checkpoints this goes into the
>>>>> 3-5
>>>>> > minutes range practically blocking the job for that period of time.
>>>>> > Yesterday I have observed even 10 minute delays. First I thought
>>>>> that some
>>>>> > sources might trigger checkpoints later than others, but adding
some
>>>>> > logging and comparing it it seems that the triggerCheckpoint was
>>>>> received
>>>>> > at the same time.
>>>>> >
>>>>> > Interestingly only one of the 3 kafka sources in the job seems to
be
>>>>> > affected (last time I checked at least). We are still using the
0.8
>>>>> > consumer with commit on checkpoints. Also I dont see this happen
in
>>>>> other
>>>>> > jobs.
>>>>> >
>>>>> > Any clue on what might cause this?
>>>>> >
>>>>> > Thanks :)
>>>>> > Gyula
>>>>> >
>>>>> >
>>>>> >
>>>>> > Hi,
>>>>> >
>>>>> > I have noticed a strange behavior in one of our jobs: every once
in a
>>>>> > while the Kafka source checkpointing time becomes extremely large
>>>>> > compared to what it usually is. (To be very specific it is a kafka
>>>>> > source chained with a stateless map operator)
>>>>> >
>>>>> > To be more specific checkpointing the offsets usually takes around
>>>>> 10ms
>>>>> > which sounds reasonable but in some checkpoints this goes into the
>>>>> 3-5
>>>>> > minutes range practically blocking the job for that period of time.
>>>>> > Yesterday I have observed even 10 minute delays. First I thought
that
>>>>> > some sources might trigger checkpoints later than others, but adding
>>>>> > some logging and comparing it it seems that the triggerCheckpoint
was
>>>>> > received at the same time.
>>>>> >
>>>>> > Interestingly only one of the 3 kafka sources in the job seems to
be
>>>>> > affected (last time I checked at least). We are still using the
0.8
>>>>> > consumer with commit on checkpoints. Also I dont see this happen
in
>>>>> > other jobs.
>>>>> >
>>>>> > Any clue on what might cause this?
>>>>> >
>>>>> > Thanks :)
>>>>> > Gyula
>>>>>
>>>>> --
>>>>> Urs Schönenberger - [hidden email]
>>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=5>
>>>>>
>>>>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>>>>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>>>>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>>>>
>>>>
>>>>
>>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Why-would-a-kafka-source-checkpoint-take-
> so-long-tp14193p14210.html
> To start a new topic under Apache Flink User Mailing List archive., email
> ml+s2336050n1h83@n4.nabble.com
> To unsubscribe from Apache Flink User Mailing List archive., click here
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
> .
> NAML
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Why-would-a-kafka-source-checkpoint-take-so-long-tp14193p14232.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Mime
View raw message