From user-return-28933-archive-asf-public=cust-asf.ponee.io@flink.apache.org Wed Jul 31 09:05:12 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id C709018062B for ; Wed, 31 Jul 2019 11:05:11 +0200 (CEST) Received: (qmail 39112 invoked by uid 500); 31 Jul 2019 09:05:08 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 39102 invoked by uid 99); 31 Jul 2019 09:05:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jul 2019 09:05:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 55325182906 for ; Wed, 31 Jul 2019 09:05:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.203 X-Spam-Level: ** X-Spam-Status: No, score=2.203 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.201, HTML_MESSAGE=2, KAM_SHORT=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=ververica-com.20150623.gappssmtp.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id A_lVxyXckGHQ for ; Wed, 31 Jul 2019 09:05:06 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.221.43; helo=mail-wr1-f43.google.com; envelope-from=piotr@data-artisans.com; receiver= Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id B24FEBC7F0 for ; Wed, 31 Jul 2019 09:05:05 +0000 (UTC) Received: by mail-wr1-f43.google.com with SMTP id c2so65559108wrm.8 for ; Wed, 31 Jul 2019 02:05:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ververica-com.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=80hVNc7XbTGKFXVeWYZoTQVCH+C4zxVTnAvOp9mf+t0=; b=sCdL1ify57w66JeiQUB7Ge2LvYL9J97qQSfBiuCDp4YGuCSipK1UQsO2D9uaQvWbWX VsdBQzwialUcWdtQy0Ao4rto27n1IISY9dOkNJcQHBe++d9PA3VdnKhsHE9CfUWqXBMG FSGFJ4TCQOiwbCs/S8mbri2yTDmFIaiZAVYFZOxfguIhueABMQJc05FX5W4WoXSHnEqK sDwKxMBDbmwl/9jW0JqS9+kahRvlUDv0ZnQWxpBBO1Wm+sn93p+FmPHCq2Sho+mX8zhF ioDrqqwFVsJUk5bCavXHU0kDoiiAEbS/oLKIXH32V/oiExU3HOQEX1gzuUue8uqb59/6 au0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=80hVNc7XbTGKFXVeWYZoTQVCH+C4zxVTnAvOp9mf+t0=; b=maxcdSbl1EgrrUuZ4bcBleAKDqwhX4Z2TU5jxJsTGn8sLfdSCPKhMBjm/2S1qew789 xgePYKLQ3JLSrqOnG8tdhqzhmiqVq1B8nYwfxPLk/UkWbqeemyk9TpVuhqPkPvOPdgHD UHCSNxzJLe4VITxWkx8yF5kHGT532wlW6EVIHxZZEoM3qaCcI7963x/CPnem/Meh3sIz A98WbTN/nHLqypquZFsgksOkC2jDRwWz+9F+pZp/sYwoi/YOCLjpDqNGu1co0rfZzP6E FC+JTqb9LTNwxVIrslQfiiDiRIfyfAPBzucf+I0JQx/tk52qbLA1QN15aIT0qvBa57Ou jtVQ== X-Gm-Message-State: APjAAAUC5W8u0Zju1L4sVSuOi+6RqDk0eqbPlufJLNja3HBngtKzGfzT AUBWjAJ0O84yqs+k7Hf9aNtQlAkDnfrxAQ== X-Google-Smtp-Source: APXvYqzAk388Mk7D65PjxaAxDdPYJdLMnB5q0JPkS0r5nCBFD0aEajFDTBgXfYjiyH8k4EtxODIyDA== X-Received: by 2002:adf:f8cf:: with SMTP id f15mr12450895wrq.333.1564563899173; Wed, 31 Jul 2019 02:04:59 -0700 (PDT) Received: from piotrs-mbp.office.data-artisans.net (gw-dataartisans.bgr1-r1.de.syseleven.net. [37.44.7.170]) by smtp.gmail.com with ESMTPSA id w7sm78114712wrn.11.2019.07.31.02.04.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 31 Jul 2019 02:04:58 -0700 (PDT) From: Piotr Nowojski Message-Id: <4ABFE70A-4710-4B78-B405-24266EB9F0D1@ververica.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_CED883CB-3B1E-4F02-94D8-2EA94AB84C6E" Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: Checkpoints very slow with high backpressure Date: Wed, 31 Jul 2019 11:04:57 +0200 In-Reply-To: Cc: "user@flink.apache.org" To: Mohammad Hosseinian References: X-Mailer: Apple Mail (2.3445.104.11) --Apple-Mail=_CED883CB-3B1E-4F02-94D8-2EA94AB84C6E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi, For Flink 1.8 (and 1.9) the only thing that you can do, is to try to = limit amount of data buffered between the nodes (check Flink network = configuration [1] for number of buffers and or buffer pool sizes). This = can reduce maximal throughput (but only if the network transfer is a = significant cost, for example if your records are extremely quick to = process), but it will speed up checkpointing during back pressure. There are some plans to address this and maybe there will be some = improvement in Flink 1.10.=20 If your job is completely stalled because of an outage, then I don=E2=80=99= t think that you can do much now, since even with only one single = buffered record the checkpoints will not progress. We might try to = address this, but that=E2=80=99s further down the road. Piotrek [1] = https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html = > On 30 Jul 2019, at 17:50, Mohammad Hosseinian = wrote: >=20 > Hi,=20 >=20 > I'm still facing the same issue under 1.8. Our pipeline uses = end-to-end > exactly-once semantic, which means the consumer program cannot read = the > messages until they are committed. So in case of an outage, the whole > runtime delay is passed over to the next stream processor application = and > creates an even larger delay in our processing pipeline. Is there any = way to > force the checkpoint to complete even under backpressure situation?=20 >=20 > Thank you in advance.=20 >=20 > Regards,=20 > --=20 > Mohammad Hosseinian > Software Developer > Information Design One AG >=20 > Phone +49-69-244502-0 > Fax +49-69-244502-10 > Web www.id1.de >=20 >=20 > Information Design One AG, Baseler Strasse 10, 60329 Frankfurt am = Main, Germany > Registration: Amtsgericht Frankfurt am Main, HRB 52596 > Executive Board: Robert Peters, Benjamin Walther, Supervisory Board: = Christian Hecht >=20 --Apple-Mail=_CED883CB-3B1E-4F02-94D8-2EA94AB84C6E Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi,

For = Flink 1.8 (and 1.9) the only thing that you can do, is to try to limit = amount of data buffered between the nodes (check Flink network = configuration [1] for number of buffers and or buffer pool sizes). This = can reduce maximal throughput (but only if the network transfer is a = significant cost, for example if your records are extremely quick to = process), but it will speed up checkpointing during back = pressure.

There = are some plans to address this and maybe there will be some improvement = in Flink 1.10. 

If your job is completely stalled because of an outage, then = I don=E2=80=99t think that you can do much now, since even with only one = single buffered record the checkpoints will not progress. We might try = to address this, but that=E2=80=99s further down the road.

Piotrek


On 30 Jul 2019, at 17:50, Mohammad Hosseinian = <mohammad.hosseinian@id1.de> wrote:

Hi,=20

I'm still facing the same issue under 1.8. Our pipeline uses end-to-end
exactly-once semantic, which means the consumer program cannot read the
messages until they are committed. So in case of an outage, the whole
runtime delay is passed over to the next stream processor application =
and
creates an even larger delay in our processing pipeline. Is there any =
way to
force the checkpoint to complete even under backpressure situation?=20

Thank you in advance.=20

Regards, 
--

Mohammad = Hosseinian
Software Developer
Information Design One = AG


Phone +49-69-244502-0
Fax +49-69-244502-10
= Web www.id1.de



Information Design One AG, Baseler Strasse 10, 60329 = Frankfurt am Main, Germany
Registration: Amtsgericht Frankfurt am Main, HRB 52596
Executive Board: Robert Peters, Benjamin Walther, Supervisory Board: = Christian Hecht


= --Apple-Mail=_CED883CB-3B1E-4F02-94D8-2EA94AB84C6E--