flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Theo Diefenthal <theo.diefent...@scoop-software.de>
Subject Re: [External] Measuring Kafka consumer lag
Date Tue, 16 Jun 2020 08:23:13 GMT
Hi Padarn, 

We configure our Flink KafkaConsumer with setCommitOffsetsOnCheckpoints(true). In this case,
the offsets are committed on each checkpoint for the conumer group of the application. We
have an external monitoring on our kafka consumer groups (Just a small script) which writes
kafka infos like: startOffset, endOffset and current committed position for all consumer groups
for each topic and partition to our metrics db. I like that approach of monitoring as it is
rather independent of Flink and thus reliable in terms of detecting problems if Flink is too
slow. Of course, we also rely heavily on flink internal metrics, but for the first check of
"is everything ok?", we check the kafka topic metrics and see "there are XX events coming
in and there is no lag (backpressure) => All fine". 

Best regards 
Theo 


Von: "Padarn Wilson" <padarn.wilson@grab.com> 
An: "Robert Metzger" <rmetzger@apache.org>, "user" <user@flink.apache.org> 
Gesendet: Dienstag, 16. Juni 2020 02:52:16 
Betreff: Re: [External] Measuring Kafka consumer lag 

Thanks Robert. 
Yes we monitor many of the Flink internal metric, which is why I was surprised that we were
unable to notice the warning signs before our consumers notified us. 

It would be nice to measure the topic vs consumer group offset of the flink consumer. 

On Tue, Jun 16, 2020 at 1:57 AM Robert Metzger < [ mailto:rmetzger@apache.org | rmetzger@apache.org
] > wrote: 



Hi Padarn, 
I usually recommend the approach you described: accessing/monitoring the lag via Flink's metrics
system. Sometimes it also makes sense to consider application level metrics. 
I checked Youtube for past Flink Forward talks, but I couldn't find a video. I'm sure there
were users talking about best practices for monitoring Flink in the past ... 

Best, 
Robert 

On Sun, Jun 14, 2020 at 5:47 AM Padarn Wilson < [ mailto:padarn.wilson@grab.com | padarn.wilson@grab.com
] > wrote: 

BQ_BEGIN

Hi all, 
I'm looking for some advice on how other people measure consumer lag for Kafka consumers.
Recently we had an application that looked like it was performing identically to before, but
all of a sudden the throughput of the job decreased dramatically. However it was not clear
from our Flink metrics, only from the lag in time vs watermark time that our consumers were
measuring. 

How do people approach measuring this? 

Thanks, 
Padarn 


By communicating with Grab Inc and/or its subsidiaries, associate companies and jointly controlled
entities (“Grab Group”), you are deemed to have consented to the processing of your personal
data as set out in the Privacy Notice which can be viewed at [ https://grab.com/privacy/ |
https://grab.com/privacy/ ] 

This email contains confidential information and is only for the intended recipient(s). If
you are not the intended recipient(s), please do not disseminate, distribute or copy this
email Please notify Grab Group immediately if you have received this by mistake and delete
this email from your system. Email transmission cannot be guaranteed to be secure or error-free
as any information therein could be intercepted, corrupted, lost, destroyed, delayed or incomplete,
or contain viruses. Grab Group do not accept liability for any errors or omissions in the
contents of this email arises as a result of email transmission. All intellectual property
rights in this email and attachments therein shall remain vested in Grab Group, unless otherwise
provided by law. 




BQ_END



By communicating with Grab Inc and/or its subsidiaries, associate companies and jointly controlled
entities (“Grab Group”), you are deemed to have consented to the processing of your personal
data as set out in the Privacy Notice which can be viewed at [ https://grab.com/privacy/ |
https://grab.com/privacy/ ] 

This email contains confidential information and is only for the intended recipient(s). If
you are not the intended recipient(s), please do not disseminate, distribute or copy this
email Please notify Grab Group immediately if you have received this by mistake and delete
this email from your system. Email transmission cannot be guaranteed to be secure or error-free
as any information therein could be intercepted, corrupted, lost, destroyed, delayed or incomplete,
or contain viruses. Grab Group do not accept liability for any errors or omissions in the
contents of this email arises as a result of email transmission. All intellectual property
rights in this email and attachments therein shall remain vested in Grab Group, unless otherwise
provided by law. 

Mime
View raw message