flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Nowojski <pi...@ververica.com>
Subject Re: How does Flink handle backpressure in EMR
Date Thu, 05 Dec 2019 19:03:23 GMT
Hi,

You can find information how to use metrics here [1]. I don’t think there is a straightforward
way to access them from within a job. You could access them via JMX when using JMXReporter
or you can implement some custom reporter, that could expose the metrics via localhost connections
or some static (:S) variables.

Piotrek

[1] https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html <https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html>

> On 5 Dec 2019, at 19:40, Nguyen, Michael <Michael.Nguyen79@T-Mobile.com> wrote:
> 
> Hi Piotrek,
>  
> For the second article, I understand I can monitor the backpressure status via the Flink
Web UI. Can I refer to the same metrics in my Flink jobs itself? For example, can I put in
an if statement to check for when outPoolUsage reaches 100%?
>  
> Thank you,
> Michael
>  
> From: Piotr Nowojski <piotr@ververica.com <mailto:piotr@ververica.com>>
> Date: Thursday, December 5, 2019 at 10:27 AM
> To: Michael Nguyen <Michael.Nguyen79@T-Mobile.com <mailto:Michael.Nguyen79@T-Mobile.com>>
> Cc: Khachatryan Roman <khachatryan.roman@gmail.com <mailto:khachatryan.roman@gmail.com>>,
"user@flink.apache.org <mailto:user@flink.apache.org>" <user@flink.apache.org <mailto:user@flink.apache.org>>
> Subject: Re: How does Flink handle backpressure in EMR
>  
> [External]
>  
> Hi, 
>  
> If you are using event time and watermarks, you can monitor the delays using `currentInputWatermark`
metric [1]. If not (or alternatively), this blog post [2] describes how to check back pressure
status [2] for Flink up to 1.9. In Flink 1.10 there will be an additional new metric for that
[3].
>  
> Piotrek
>  
> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/debugging_event_time.html
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-release-1.9%2Fmonitoring%2Fdebugging_event_time.html&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570374404&sdata=GS8PCzeohW95e%2BT5phGljgHdMArImMjqBxSkR79dIzw%3D&reserved=0>
> [2] https://flink.apache.org/2019/07/23/flink-network-stack-2.html <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2019%2F07%2F23%2Fflink-network-stack-2.html&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570384404&sdata=fA%2FBiBnL%2BdqLN4FJ9t2%2B71b7M7Ii7rjfrsmRUlqgIiA%3D&reserved=0>
> [3] https://issues.apache.org/jira/browse/FLINK-14813 <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FFLINK-14813&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570384404&sdata=NSe%2BN9ur9u3YLEGqqq%2F%2FBH8XXgd2jtlwV67LUyXmA8A%3D&reserved=0>
> 
> 
>> On 5 Dec 2019, at 19:11, Nguyen, Michael <Michael.Nguyen79@T-Mobile.com <mailto:Michael.Nguyen79@T-Mobile.com>>
wrote:
>>  
>> Hi Roman,
>>  
>> So right now we have a couple Flink jobs that consumes data from one Kinesis data
stream. These jobs vary from a simple dump into a PostgreSQL table to calculating anomalies
in a 30 minute window.
>>  
>> One large scenario we were worried about was what if one of our jobs was taking a
long time to process the Kinesis stream data? How would we detect this scenario from within
our Flink job?
>>  
>> We do not want our Flink jobs to lag too far from the latest point in our Kinesis
stream as we are trying to deliver information in (near) real-time.
>>  
>> From: Khachatryan Roman <khachatryan.roman@gmail.com <mailto:khachatryan.roman@gmail.com>>
>> Date: Thursday, December 5, 2019 at 9:47 AM
>> To: Michael Nguyen <Michael.Nguyen79@T-Mobile.com <mailto:Michael.Nguyen79@T-Mobile.com>>
>> Cc: Piotr Nowojski <piotr@ververica.com <mailto:piotr@ververica.com>>,
"user@flink.apache.org <mailto:user@flink.apache.org>" <user@flink.apache.org <mailto:user@flink.apache.org>>
>> Subject: Re: How does Flink handle backpressure in EMR
>>  
>> [External]
>>  
>> @Michael, 
>> Could you please describe your topology with which operators being slow, back-pressured
and probably skews in sources?
>>  
>> Regards,
>> Roman
>>  
>>  
>> On Thu, Dec 5, 2019 at 6:20 PM Nguyen, Michael <Michael.Nguyen79@t-mobile.com
<mailto:Michael.Nguyen79@t-mobile.com>> wrote:
>>> Thank you for the response Roman and Piotrek!
>>> 
>>> @Roman - can you clarify on what you mean when you mentioned Flink propagating
it back to the sources? 
>>> 
>>> Also, if one of my Flink operators is processing records too slowly and is getting
further away from the latest record of my source data stream, is there a way to detect this
slow processing in Flink? Would this be detected by Flink's backpressure mechanism?
>>> 
>>> Thanks,
>>> Michael
>>> 
>>> On 12/5/19, 7:57 AM, "Piotr Nowojski" <piotr@data-artisans.com <mailto:piotr@data-artisans.com>
on behalf of piotr@ververica.com <mailto:piotr@ververica.com>> wrote:
>>> 
>>>     [External]
>>> 
>>> 
>>>     Hi Michael,
>>> 
>>>     As Roman pointed out Flink currently doesn’t support the auto-scaling.
It’s on our roadmap but it requires quite a bit of preliminary work to happen before.
>>> 
>>>     Piotrek
>>> 
>>>     > On 5 Dec 2019, at 15:32, r_khachatryan <khachatryan.roman@gmail.com
<mailto:khachatryan.roman@gmail.com>> wrote:
>>>     >
>>>     > Hi Michael
>>>     >
>>>     > Flink *does* detect backpressure but currently, it only propagates it
back
>>>     > to sources.
>>>     > And so it doesn't support auto-scaling.
>>>     >
>>>     > Regards,
>>>     > Roman
>>>     >
>>>     >
>>>     > Nguyen, Michael wrote
>>>     >> How does Flink handle backpressure (caused by an increase in traffic)
in a
>>>     >> Flink job when it’s being hosted in an EMR cluster? Does Flink
detect the
>>>     >> backpressure and auto-scales the EMR cluster to handle the workload
to
>>>     >> relieve the backpressure? Once the backpressure is gone, then the
EMR
>>>     >> cluster would scale back down?
>>>     >
>>>     >
>>>     >
>>>     >
>>>     >
>>>     > --
>>>     > Sent from: https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-user-mailing-list-archive.2336050.n4.nabble.com%2F&amp;data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cfda964dacbf04cc2d2cb08d7799bc347%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111582216065152&amp;sdata=FIHbAWbkH7Tq8AjG%2BVSZoxNrOl0Bn6SukN%2BY1PyhSHg%3D&amp;reserved=0
<https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-user-mailing-list-archive.2336050.n4.nabble.com%2F&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570394400&sdata=j0YcpGEm3Lour%2FbVi2WX7hSH1vtxcBRUXNgeNnrCgBU%3D&reserved=0>

Mime
View raw message