From user-return-19145-archive-asf-public=cust-asf.ponee.io@flink.apache.org Tue Apr 3 19:39:26 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 510AE18064D for ; Tue, 3 Apr 2018 19:39:26 +0200 (CEST) Received: (qmail 13958 invoked by uid 500); 3 Apr 2018 17:39:24 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 13948 invoked by uid 99); 3 Apr 2018 17:39:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Apr 2018 17:39:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 611A9C0047 for ; Tue, 3 Apr 2018 17:39:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.75 X-Spam-Level: X-Spam-Status: No, score=0.75 tagged_above=-999 required=6.31 tests=[KAM_INFOUSMEBIZ=0.75, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id EOXHsdMPD36p for ; Tue, 3 Apr 2018 17:39:22 +0000 (UTC) Received: from mailin01.mxof.com (mailin01.mxof.com [198.24.62.11]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 1C8D75F19C for ; Tue, 3 Apr 2018 17:39:21 +0000 (UTC) Received: from mta01.mxof.net (mta01.mxof.net [10.1.0.31]) by mailin01.mxof.com (8.14.4/8.14.4/Debian-8+deb8u2) with ESMTP id w33HdDEZ015313 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 3 Apr 2018 10:39:13 -0700 Received: from mta01.mxof.net (localhost [127.0.0.1]) by mta01.mxof.net (Postfix) with ESMTPS id 3E047261BB2; Tue, 3 Apr 2018 10:39:13 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mta01.mxof.net (Postfix) with ESMTP id 2CD52261E27; Tue, 3 Apr 2018 10:39:13 -0700 (PDT) X-Virus-Scanned: amavisd-new at prxy.com Received: from mta01.mxof.net ([127.0.0.1]) by localhost (mta01.mxof.net [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id AVBKzHCK2mmP; Tue, 3 Apr 2018 10:39:13 -0700 (PDT) Received: from [192.168.3.171] (c-73-192-239-205.hsd1.ca.comcast.net [73.192.239.205]) by mta01.mxof.net (Postfix) with ESMTPSA id E7775261E1E; Tue, 3 Apr 2018 10:39:12 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: Multiple Async IO From: Ken Krugler In-Reply-To: Date: Tue, 3 Apr 2018 10:39:11 -0700 Cc: Maxim Parkachov Content-Transfer-Encoding: quoted-printable Message-Id: <279F0863-2A8A-49B0-B271-A7D2E07B6554@transpac.com> References: To: user X-Mailer: Apple Mail (2.3273) X-MSH-Id: 67A3649BC0CA47EFB944376C317B1B95 X-Bayes-Prob: 0.0001 (Score 0, tokens from: outgoing:default, base:default, @@RPTN) X-CanIt-Geo: No geolocation information available for 10.1.0.31 X-CanItPRO-Stream: outgoing:default (inherits from base:default) X-Canit-Stats-ID: 01VtRDdJ0 - b86af9c7fc73 - 20180403 (trained as not-spam) X-Antispam-Training-Forget: https://spamblock.prxy.com/b.php?c=f&i=01VtRDdJ0&m=b86af9c7fc73&rlm=outgoing&t=20180403 X-Antispam-Training-Nonspam: https://spamblock.prxy.com/b.php?c=n&i=01VtRDdJ0&m=b86af9c7fc73&rlm=outgoing&t=20180403 X-Antispam-Training-Phish: https://spamblock.prxy.com/b.php?c=p&i=01VtRDdJ0&m=b86af9c7fc73&rlm=outgoing&t=20180403 X-Antispam-Training-Spam: https://spamblock.prxy.com/b.php?c=s&i=01VtRDdJ0&m=b86af9c7fc73&rlm=outgoing&t=20180403 X-Scanned-By: CanIt (www . roaringpenguin . com) on 10.1.0.11 Hi Maxim, If reducing latency is the goal, then option #1 seems better. Though you=E2=80=99d need additional logic inside of your AsyncFunction = to run all 20 queries in parallel. I=E2=80=99d also consider a third option... Use a FlatMapFunction to create 20 copies of the event (assuming it=E2=80=99= s not large), with an additional field indicating which query should be = made. Follow that with a rebalance(), and a single AsyncFunction that makes = the appropriate query for the event, based on this new field. Then make sure you=E2=80=99ve got sufficient parallelism for your = AsyncFunction to handle this fan-out. This should let you run the queries for a single event in parallel. =E2=80=94 Ken > On Apr 3, 2018, at 9:59 AM, Maxim Parkachov = wrote: >=20 > Hi everyone, >=20 > I'm writing streaming job which needs to query Cassandra for each = event multiple times, around 20. I would like to use Async IO for that = but not sure which option to choose: >=20 > 1. Implement One AsyncFunction with 20 queries inside > 2. Implement 20 AsyncFunctions, each with 1 query inside >=20 > Taking into account that each event needs all queries. Reduce amount = of queries for each record is not an option.=20 >=20 > In this case I would like to minimise processing time of event, even = if throughput will suffer. Any advice or consideration is greatly = appreciated. >=20 > Thanks, > Maxim. > =20 -------------------------------------------- http://about.me/kkrugler +1 530-210-6378