Hi Roman,

 

So right now we have a couple Flink jobs that consumes data from one Kinesis data stream. These jobs vary from a simple dump into a PostgreSQL table to calculating anomalies in a 30 minute window.

 

One large scenario we were worried about was what if one of our jobs was taking a long time to process the Kinesis stream data? How would we detect this scenario from within our Flink job?

 

We do not want our Flink jobs to lag too far from the latest point in our Kinesis stream as we are trying to deliver information in (near) real-time.

 

From: Khachatryan Roman <khachatryan.roman@gmail.com>
Date: Thursday, December 5, 2019 at 9:47 AM
To: Michael Nguyen <Michael.Nguyen79@T-Mobile.com>
Cc: Piotr Nowojski <piotr@ververica.com>, "user@flink.apache.org" <user@flink.apache.org>
Subject: Re: How does Flink handle backpressure in EMR

 

[External]

 

@Michael,

Could you please describe your topology with which operators being slow, back-pressured and probably skews in sources?

 

Regards,

Roman

 

 

On Thu, Dec 5, 2019 at 6:20 PM Nguyen, Michael <Michael.Nguyen79@t-mobile.com> wrote:

Thank you for the response Roman and Piotrek!

@Roman - can you clarify on what you mean when you mentioned Flink propagating it back to the sources?

Also, if one of my Flink operators is processing records too slowly and is getting further away from the latest record of my source data stream, is there a way to detect this slow processing in Flink? Would this be detected by Flink's backpressure mechanism?

Thanks,
Michael

On 12/5/19, 7:57 AM, "Piotr Nowojski" <piotr@data-artisans.com on behalf of piotr@ververica.com> wrote:

    [External]


    Hi Michael,

    As Roman pointed out Flink currently doesn’t support the auto-scaling. It’s on our roadmap but it requires quite a bit of preliminary work to happen before.

    Piotrek

    > On 5 Dec 2019, at 15:32, r_khachatryan <khachatryan.roman@gmail.com> wrote:
    >
    > Hi Michael
    >
    > Flink *does* detect backpressure but currently, it only propagates it back
    > to sources.
    > And so it doesn't support auto-scaling.
    >
    > Regards,
    > Roman
    >
    >
    > Nguyen, Michael wrote
    >> How does Flink handle backpressure (caused by an increase in traffic) in a
    >> Flink job when it’s being hosted in an EMR cluster? Does Flink detect the
    >> backpressure and auto-scales the EMR cluster to handle the workload to
    >> relieve the backpressure? Once the backpressure is gone, then the EMR
    >> cluster would scale back down?
    >
    >
    >
    >
    >
    > --
    > Sent from: https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-user-mailing-list-archive.2336050.n4.nabble.com%2F&amp;data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cfda964dacbf04cc2d2cb08d7799bc347%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111582216065152&amp;sdata=FIHbAWbkH7Tq8AjG%2BVSZoxNrOl0Bn6SukN%2BY1PyhSHg%3D&amp;reserved=0