Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DECEF200C30 for ; Tue, 7 Mar 2017 14:46:22 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id DD6CC160B68; Tue, 7 Mar 2017 13:46:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0CBF0160B5F for ; Tue, 7 Mar 2017 14:46:21 +0100 (CET) Received: (qmail 28310 invoked by uid 500); 7 Mar 2017 13:46:19 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 28298 invoked by uid 99); 7 Mar 2017 13:46:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Mar 2017 13:46:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A70FEC0ABA for ; Tue, 7 Mar 2017 13:46:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 3GFPX_Gz5VMD for ; Tue, 7 Mar 2017 13:46:16 +0000 (UTC) Received: from mail-wm0-f46.google.com (mail-wm0-f46.google.com [74.125.82.46]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 05F505F1EE for ; Tue, 7 Mar 2017 13:46:16 +0000 (UTC) Received: by mail-wm0-f46.google.com with SMTP id n11so5232628wma.0 for ; Tue, 07 Mar 2017 05:46:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=MlhgRvCXukWxGmGFKv5Z/VSs5CsnQWjHzy1PXvqjrmg=; b=sro5Ek2iGCqpjZfil8mKO5oY9ag27bVZJHK8UgCCSWbQSpsrD5owCMEEExiiOXRPCm HWc2u8eUx7EmaCmpN/fcfcL3/8qio+WBI6L4AROKoQZ8vCtDTjcy0M4MgdYJp9VAP5V6 FeQfDV9or6vvERDItQi0MENIvCklo4yRdJ5s6PJNdJSx74wUwmIim9zYzqbV2bll3mMG 88D8BRncId42F5oJiLbecMOffVdNziQAOQ5R5SHqM1yyFxnPTzCQL5QMY78OexcxDD4I e0sI9R/K7u/eLlGrUICRXgeN8wQZhL11lbAIvtOmSTYr8YnA/+Xnrfcv0e59/ZFX8ZH3 l+Eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=MlhgRvCXukWxGmGFKv5Z/VSs5CsnQWjHzy1PXvqjrmg=; b=GQTsnQNmCGit61jwkXojuAM+6Lqw3Y7JFZpERwKWPj7/NOCjXJjsnI4CAM771r/cLM rS9HmXOSY2263GnkxSh2e4r4GbJ8MX644QEp+snS+6+rdyvRem6/TV+vbKvjkSpdjLbH SgqMMRX+HYZ9tS6lg1mYwByjVxcQwXKMR3lYYTvnn6E/IxYf/OUgWVrZbdsI5mWV08XT 4XmZ2aVvnzFLF3FNlj5CTPaYLOJbQBY0oJX/qKuzQFyR2Dmsb1zndB0ROra0uV1/opcG aPbjAe51ujCYbPCOTS0Es2MFFDQrTO1BXC8gvF38x8VlD4yQcfLmcAS7sXb+jz+nuT2k /Kmg== X-Gm-Message-State: AMke39nUwyRxV3w/rQpauzA5Vk5lqEGMbFV7SiYjQ7CyMbfpkgjSkMbt+yFE4TxSCflhLC35lXX0xJjo9YWIkg== X-Received: by 10.28.203.197 with SMTP id b188mr989884wmg.110.1488893988892; Tue, 07 Mar 2017 05:39:48 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.164.130 with HTTP; Tue, 7 Mar 2017 05:39:48 -0800 (PST) In-Reply-To: References: <1488809900.61040.901922504.5647FE08@webmail.messagingengine.com> From: "wenlong.lwl" Date: Tue, 7 Mar 2017 21:39:48 +0800 Message-ID: Subject: Re: [DISCUSS] FLIP-17 Side Inputs To: dev@flink.apache.org Content-Type: multipart/alternative; boundary=94eb2c1302fe50f74e054a2425d8 archived-at: Tue, 07 Mar 2017 13:46:23 -0000 --94eb2c1302fe50f74e054a2425d8 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Aljoscha, thank you for the proposal, it is great to hear about the progress in side input. Following is my point of view: 1. I think there may be an option to block the processing of the main input instead of buffer the data in state because in production, the through put of the main input is usually much larger, and buffering the data before the side input may slow down the preparing of side input since the i-o and computing resources are always limited. 2. another issue may need to be disscussed: how can we do checkpointing with side input, because static side input may finish soon once started which will stop the checkpointing. 3. I agree with Gyula that user should be able to determines when a side input is ready? Maybe we can do it one step further: whether users can determine a operator with multiple inputs to process which input each time or not? It would be more flexible. Best Regards! Wenlong On 7 March 2017 at 18:39, Ventura Del Monte wrote: > Hi Aljoscha, > > Thank you for the proposal and for bringing up again this discussion. > > Regarding the implementation aspect,I would say the first way could > be easier/faster to implement but it could add some overhead when > dealing with multiple side inputs through the current 2-streams union > transform. I tried the second option myself as it has less overhead > but then the outcome was something close to a N-ary operator consuming > first each side input while buffering the main one. > Therefore, I would choose the third option as it is more generic > and might help also in other scenarios, although its implementation > requires more effort. > I also agree with Gyula, I think the user should be allowed to define the > condition that determines when a side input is ready, e.g., load the side > input first, incrementally update the side input. > > Best, > Ventura > > > > > > > This message, for the D. Lgs n. 196/2003 (Privacy Code), may contain > confidential and/or privileged information. If you are not the addressee = or > authorized to receive this for the addressee, you must not use, copy, > disclose or take any action based on this message or any information > herein. If you have received this message in error, please advise the > sender immediately by reply e-mail and delete this message. Thank you for > your cooperation. > > On Mon, Mar 6, 2017 at 3:50 PM, Gyula F=C3=B3ra wr= ote: > > > Hi Aljoscha, > > > > Thank you for the nice proposal! > > > > I think it would make sense to allow user's to affect the readiness of > the > > side input. I think making it ready when the first element arrives is > only > > slightly better then making it always ready from usability perspective. > For > > instance if I am joining against a static data set I want to wait for t= he > > whole set before making it ready. This could be exposed as a user defin= ed > > condition that could also recognize bounded inputs maybe. > > > > Maybe we could also add an aggregating (merging) side input type, that > > could work as a broadcast state. > > > > What do you think? > > > > Gyula > > > > Aljoscha Krettek ezt =C3=ADrta (id=C5=91pont: 201= 7. m=C3=A1rc. > 6., > > H, 15:18): > > > > > Hi Folks, > > > > > > I would like to finally agree on a plan for implementing side inputs = in > > > Flink. There has already been an attempt to come to consensus [1], > which > > > resulted in two design documents. I tried to consolidate those two an= d > > > also added a section about implementation plans. This is the resultin= g > > > FLIP: > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP- > > 17+Side+Inputs+for+DataStream+API > > > > > > > > > In terms of semantics I tried to go with the minimal viable solution. > > > The part that needs discussing is how we want to implement this. I > > > outlined three possible implementation plans in the FLIP but what it > > > boils down to is that we need to introduce some way of getting severa= l > > > inputs into an operator/task. > > > > > > > > > Please have a look at the doc and let us know what you think. > > > > > > > > > > > > Best, > > > > > > Aljoscha > > > > > > > > > > > > [1] > > > https://lists.apache.org/thread.html/797df0ba066151b77c7951fd7d603a > > 8afd7023920d0607a0c6337db3@1462181294@%3Cdev.flink.apache.org%3E > > > > > > --94eb2c1302fe50f74e054a2425d8--