From user-return-31270-archive-asf-public=cust-asf.ponee.io@flink.apache.org Thu Dec 5 18:27:38 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4FC1A180643 for ; Thu, 5 Dec 2019 19:27:38 +0100 (CET) Received: (qmail 58076 invoked by uid 500); 5 Dec 2019 18:27:35 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 58066 invoked by uid 99); 5 Dec 2019 18:27:35 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Dec 2019 18:27:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 4B9ED1A3453 for ; Thu, 5 Dec 2019 18:27:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.651 X-Spam-Level: X-Spam-Status: No, score=0.651 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.2, HTTPS_HTTP_MISMATCH=0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URI_HEX=0.1] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=ververica-com.20150623.gappssmtp.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id ZrcdDJZxegvU for ; Thu, 5 Dec 2019 18:27:33 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::342; helo=mail-wm1-x342.google.com; envelope-from=piotr@data-artisans.com; receiver= Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com [IPv6:2a00:1450:4864:20::342]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 4BD227DDE9 for ; Thu, 5 Dec 2019 18:27:33 +0000 (UTC) Received: by mail-wm1-x342.google.com with SMTP id f4so7959893wmj.1 for ; Thu, 05 Dec 2019 10:27:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ververica-com.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=yhSpLiRgcgvZdEn3BhcQoIfavZso4GQbKlRwvK+uZ/k=; b=yLTZn0GWOmklEdh6effDieNDygyX2EkMPlftgNosw6wNkFctsh0U8a6ufBw9omOnXm YLHkq9MCtnNfFAtdTs96SQVrjk6SjrLO7ffeUXmCzwe0cD4dwCU/vMpRE1BgAd+jf9Ry is3HpeP2gXmBkT9Uw7osr2q0nmNrhkp/aLLutIKGwskkC4nPSRDk17NAmkMxOurQjzEU vVyl9izXO1ydaEgklaWO9ZqVcwkDMwGSyJ274jS4bdBArJDgiD+Y9V8/XguxKWVTXMuU xAjSFZZkL71pNwkkK6S2Il/JN+mO3BPwPmOqoD0vtqTy60bvm93prVyvvPLc86IxwQx9 V+IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=yhSpLiRgcgvZdEn3BhcQoIfavZso4GQbKlRwvK+uZ/k=; b=S4Y5WRZwKSaI9bxGsZzISmhxlhGmV4LO1jwvm3ajg4qq7zu4dvolSZbLD/EPVagrr9 CbnBxymMA4wqwhWaFbLlIjn1Lx2DvT3T2j7066bXv4T8SAA2rJzk1ZGxOt3OFBMWfJe2 erWNCFp7VTR/aOlYpU3ar6QDSukhle4Bn4Av+WHiM3AOniOqMtfRZmvlypt4y0QdT0rr DbM7VncxHWZf8XbFoqUGNvezwhwjSC62dPLi7kOyWPkIdU8o9uEm8crB+r+lWuNaRlWc +HaC+F4E96MLTSMyU4Q2bJLkPelUD0PZwKqsG2vv3fO+7XLamZNCLNmaXvNPV6nh8PU/ OCzA== X-Gm-Message-State: APjAAAWZKZOcBtRkfUMn9ZxdCVG6wmH04drD3dNel7H5/kUiC8VxapBU plCWXBCdJt4WinEDdq7LD4uurg== X-Google-Smtp-Source: APXvYqwHIXZhi7huDhQEfJm2yHO0Y8dxdRWHuWK0DmJ9IYF89HKsXS6rsXoC2Z3p5VkjyeDvfoKOWA== X-Received: by 2002:a1c:6745:: with SMTP id b66mr6896858wmc.173.1575570452696; Thu, 05 Dec 2019 10:27:32 -0800 (PST) Received: from piotrs-mbp.fritz.box ([79.140.122.131]) by smtp.gmail.com with ESMTPSA id h2sm13656168wrt.45.2019.12.05.10.27.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 05 Dec 2019 10:27:32 -0800 (PST) From: Piotr Nowojski Message-Id: <286B4E4B-E5DA-4E7C-B4DC-FCA923DEF9B8@ververica.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_42183A00-10D3-417B-B239-87D887771A15" Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: How does Flink handle backpressure in EMR Date: Thu, 5 Dec 2019 19:27:00 +0100 In-Reply-To: Cc: Khachatryan Roman , "user@flink.apache.org" To: "Nguyen, Michael" References: <1575556362299-0.post@n4.nabble.com> <02EDFA32-3088-4439-8072-922E5060E497@apache.org> <0130D8DE-9324-4C2F-819F-D69F9D8BE770@t-mobile.com> X-Mailer: Apple Mail (2.3445.104.11) --Apple-Mail=_42183A00-10D3-417B-B239-87D887771A15 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi, If you are using event time and watermarks, you can monitor the delays = using `currentInputWatermark` metric [1]. If not (or alternatively), = this blog post [2] describes how to check back pressure status [2] for = Flink up to 1.9. In Flink 1.10 there will be an additional new metric = for that [3]. Piotrek [1] = https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/deb= ugging_event_time.html = [2] https://flink.apache.org/2019/07/23/flink-network-stack-2.html = [3] https://issues.apache.org/jira/browse/FLINK-14813 = > On 5 Dec 2019, at 19:11, Nguyen, Michael = wrote: >=20 > Hi Roman, > =20 > So right now we have a couple Flink jobs that consumes data from one = Kinesis data stream. These jobs vary from a simple dump into a = PostgreSQL table to calculating anomalies in a 30 minute window. > =20 > One large scenario we were worried about was what if one of our jobs = was taking a long time to process the Kinesis stream data? How would we = detect this scenario from within our Flink job? > =20 > We do not want our Flink jobs to lag too far from the latest point in = our Kinesis stream as we are trying to deliver information in (near) = real-time. > =20 > From: Khachatryan Roman > > Date: Thursday, December 5, 2019 at 9:47 AM > To: Michael Nguyen > > Cc: Piotr Nowojski >, = "user@flink.apache.org " = > > Subject: Re: How does Flink handle backpressure in EMR > =20 > [External] > =20 > @Michael,=20 > Could you please describe your topology with which operators being = slow, back-pressured and probably skews in sources? > =20 > Regards, > Roman > =20 > =20 > On Thu, Dec 5, 2019 at 6:20 PM Nguyen, Michael = > = wrote: >> Thank you for the response Roman and Piotrek! >>=20 >> @Roman - can you clarify on what you mean when you mentioned Flink = propagating it back to the sources?=20 >>=20 >> Also, if one of my Flink operators is processing records too slowly = and is getting further away from the latest record of my source data = stream, is there a way to detect this slow processing in Flink? Would = this be detected by Flink's backpressure mechanism? >>=20 >> Thanks, >> Michael >>=20 >> On 12/5/19, 7:57 AM, "Piotr Nowojski" on behalf of piotr@ververica.com = > wrote: >>=20 >> [External] >>=20 >>=20 >> Hi Michael, >>=20 >> As Roman pointed out Flink currently doesn=E2=80=99t support the = auto-scaling. It=E2=80=99s on our roadmap but it requires quite a bit of = preliminary work to happen before. >>=20 >> Piotrek >>=20 >> > On 5 Dec 2019, at 15:32, r_khachatryan = > = wrote: >> > >> > Hi Michael >> > >> > Flink *does* detect backpressure but currently, it only = propagates it back >> > to sources. >> > And so it doesn't support auto-scaling. >> > >> > Regards, >> > Roman >> > >> > >> > Nguyen, Michael wrote >> >> How does Flink handle backpressure (caused by an increase in = traffic) in a >> >> Flink job when it=E2=80=99s being hosted in an EMR cluster? = Does Flink detect the >> >> backpressure and auto-scales the EMR cluster to handle the = workload to >> >> relieve the backpressure? Once the backpressure is gone, then = the EMR >> >> cluster would scale back down? >> > >> > >> > >> > >> > >> > -- >> > Sent from: = https://nam02.safelinks.protection.outlook.com/?url=3Dhttp%3A%2F%2Fapache-= flink-user-mailing-list-archive.2336050.n4.nabble.com%2F&data=3D02%7C0= 1%7CMichael.Nguyen79%40t-mobile.com%7Cfda964dacbf04cc2d2cb08d7799bc347%7Cb= e0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111582216065152&sdata=3DF= IHbAWbkH7Tq8AjG%2BVSZoxNrOl0Bn6SukN%2BY1PyhSHg%3D&reserved=3D0 = --Apple-Mail=_42183A00-10D3-417B-B239-87D887771A15 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi,

If = you are using event time and watermarks, you can monitor the delays = using `currentInputWatermark` metric [1]. If not (or alternatively), = this blog post [2] describes how to check back pressure status [2] for = Flink up to 1.9. In Flink 1.10 there will be an additional new metric = for that [3].

Piotrek

[3] https://issues.apache.org/jira/browse/FLINK-14813

On 5 Dec 2019, at 19:11, Nguyen, Michael <Michael.Nguyen79@T-Mobile.com> wrote:

Hi = Roman,
 
So right now we have a couple Flink jobs that consumes data = from one Kinesis data stream. These jobs vary from a simple dump into a = PostgreSQL table to calculating anomalies in a 30 minute window.
 
One large = scenario we were worried about was what if one of our jobs was taking a = long time to process the Kinesis stream data? How would we detect this = scenario from within our Flink job?
 
We do not want our Flink jobs to lag = too far from the latest point in our Kinesis stream as we are trying to = deliver information in (near) real-time.
 
From: Khachatryan Roman <khachatryan.roman@gmail.com>
Date: Thursday, December 5, = 2019 at 9:47 AM
To: Michael Nguyen <Michael.Nguyen79@T-Mobile.com>
Cc: Piotr= Nowojski <piotr@ververica.com>, "user@flink.apache.org" = <user@flink.apache.org>
Subject: Re: How does Flink = handle backpressure in EMR
 
[External]
 
@Michael, 
Could you please describe your topology = with which operators being slow, back-pressured and probably skews in = sources?
 
Regards,
Roman
 
 
On Thu, Dec 5, 2019 at = 6:20 PM Nguyen, Michael <Michael.Nguyen79@t-mobile.com> wrote:

Thank = you for the response Roman and Piotrek!

@Roman - can you clarify on what you mean when you mentioned = Flink propagating it back to the sources? 

Also, if one of my Flink operators is processing records too = slowly and is getting further away from the latest record of my source = data stream, is there a way to detect this slow processing in Flink? = Would this be detected by Flink's backpressure mechanism?

Thanks,
Michael

On 12/5/19, 7:57 AM, "Piotr Nowojski" <piotr@data-artisans.com on behalf of piotr@ververica.com> wrote:

    [External]


    Hi Michael,

  =   As Roman pointed out Flink currently doesn=E2=80=99t support the = auto-scaling. It=E2=80=99s on our roadmap but it requires quite a bit of = preliminary work to happen before.

  =   Piotrek

    > On 5 Dec = 2019, at 15:32, r_khachatryan <khachatryan.roman@gmail.com> wrote:
 =   >
    > Hi Michael
    >
    > Flink = *does* detect backpressure but currently, it only propagates it back
    > to sources.
    = > And so it doesn't support auto-scaling.
    = >
    > Regards,
  =   > Roman
    >
  =   >
    > Nguyen, Michael wrote
    >> How does Flink handle backpressure = (caused by an increase in traffic) in a
    = >> Flink job when it=E2=80=99s being hosted in an EMR cluster? = Does Flink detect the
    >> backpressure = and auto-scales the EMR cluster to handle the workload to
    >> relieve the backpressure? Once the = backpressure is gone, then the EMR
    >> = cluster would scale back down?
    >
    >
    >
    >
    >
    > --
    > Sent = from: https://nam02.safelinks.protection.outlook.com/?url=3Dhttp%3A%2= F%2Fapache-flink-user-mailing-list-archive.2336050.n4.nabble.com%2F&am= p;data=3D02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cfda964dacbf04cc2d2cb0= 8d7799bc347%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C63711158221606515= 2&amp;sdata=3DFIHbAWbkH7Tq8AjG%2BVSZoxNrOl0Bn6SukN%2BY1PyhSHg%3D&a= mp;reserved=3D0

<= /div>
= --Apple-Mail=_42183A00-10D3-417B-B239-87D887771A15--