Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 71329200BB3 for ; Wed, 2 Nov 2016 23:16:10 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 6FA9A160AFB; Wed, 2 Nov 2016 22:16:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 42D21160AF0 for ; Wed, 2 Nov 2016 23:16:09 +0100 (CET) Received: (qmail 86088 invoked by uid 500); 2 Nov 2016 22:16:03 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 86072 invoked by uid 99); 2 Nov 2016 22:16:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2016 22:16:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 730AF18014A for ; Wed, 2 Nov 2016 22:16:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.629 X-Spam-Level: ** X-Spam-Status: No, score=2.629 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 1ZBoxE57L12h for ; Wed, 2 Nov 2016 22:15:57 +0000 (UTC) Received: from mail-oi0-f53.google.com (mail-oi0-f53.google.com [209.85.218.53]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id B4BB75F4EB for ; Wed, 2 Nov 2016 22:15:56 +0000 (UTC) Received: by mail-oi0-f53.google.com with SMTP id x4so41826686oix.2 for ; Wed, 02 Nov 2016 15:15:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=xP2zLwABb0K/Umew80gCAZJLZQbcV/rkjoNz6eWU8Tc=; b=eh0Cr4/KF9ZfG40V8Ycy27VTtn/HffK9xhBkzGcytvrPiO4V8lJJb4DRcsqeAjvSl1 ahac78Bfrv9K39ig+kob3x6Jh9oEGecDdP4lZdC616c539k9WcjJoUDw6VLPDLnn5LHo vQfcOa3RVfgeA+Zbe+mogtwgdEr+ZQ0AQGQ0ez+8UarS6EkcM5CLoImFzdFViaOIZ1CL sXdE1M5nzGCvsHQNPrax0fcnCfMUs+i2X2HDQ+R/HgXI1e7DX1jtIYNQuxqgxJLw41ix 9zAL/NaD2gCcZLOxIBOJtvifBNY/XlGzFLbh8s02jhfwAlUyvroVLfIFrRkHpCeKRWSQ irBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=xP2zLwABb0K/Umew80gCAZJLZQbcV/rkjoNz6eWU8Tc=; b=IRoMY2dvQ7ragZtmcBQQ6D8nPzxfv5uSPBvpIt//XLQPWd7h9QNgd5C+mCQA3i6p9F BOZEqXeQmJj2SPhMZ/yZaGUEHBP0CijBTkDqvdmfnpJz+dmT9BfKacxvwDbDP7xt/muQ fDRxJlAuL2Q9ypDzu4y9fjRcO8ZpmKf/tA5a07oAyAH84eScKGhf4xSI5P6LMq/dW6Lb lkZE1Fz4bJF/p5hR4H6BVL/+2Na12WzB68s3DxIc3wAxjHR609wRKjXZmMtzLcWoxHni 87AFctxlyTE59S7ioqGHqI+bq1eV7GpAwD0K+lVbBZkEYZ4xHzExA8pN5n+WrG1Mvbxo MgLQ== X-Gm-Message-State: ABUngveXKIH+NsICXZdMU3Cg17jYBG7DiCCvhdFrfnmk+dt/ARsPKMCj0C16f8drIZzDMkXEk/VLD4fI17WNaQ== X-Received: by 10.107.133.76 with SMTP id h73mr7137489iod.148.1478124955044; Wed, 02 Nov 2016 15:15:55 -0700 (PDT) MIME-Version: 1.0 References: <8B754047F81D6B4290B9F4CE928333A517B30C01@lhreml503-mbx> <8B754047F81D6B4290B9F4CE928333A517B394DC@lhreml503-mbx> <8B754047F81D6B4290B9F4CE928333A517B398CB@lhreml503-mbx> In-Reply-To: From: Manu Zhang Date: Wed, 02 Nov 2016 22:15:44 +0000 Message-ID: Subject: Re: [DISCUSS] FLIP-2 Extending Window Function Metadata To: "dev@flink.apache.org" Content-Type: multipart/alternative; boundary=001a113ec556e11d9f054058c826 archived-at: Wed, 02 Nov 2016 22:16:10 -0000 --001a113ec556e11d9f054058c826 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable No problem, Aljoscha. I'll follow up with my use cases. Thanks, Manu On Wed, Nov 2, 2016 at 9:15 PM Aljoscha Krettek wrote= : > Hi Manu, > it's great that you want to work on this but another contributor has also > started looking into this and has some code already. It's unfortunate tha= t > there is no Jira issue yet for this but I think he'll open one this week. > > Cheers, > Aljoscha > > On Wed, 2 Nov 2016 at 13:00 Manu Zhang wrote: > > > Hi Aljoscha, > > > > Have you started working on ProcessWindowFunction ? If not, may I take > this > > task ? > > > > Thanks, > > Manu > > > > > > On Wed, Nov 2, 2016 at 5:16 PM Aljoscha Krettek > > wrote: > > > > > I think we reached consensus here so I would like to mark this FLIP a= s > > > accepted. We will now process with implementing the first step, i.e. > > adding > > > the new ProcessWindowFunction. > > > > > > On Mon, 1 Aug 2016 at 18:08 Aljoscha Krettek > > wrote: > > > > > > > Alright, that seems reasonable. I updated the doc to add the > Collector > > to > > > > the method signature again. > > > > > > > > On Mon, 1 Aug 2016 at 00:59 Stephan Ewen wrote: > > > > > > > > The Collector is pretty integral in all the other functions that > return > > > > multiple elements. I honestly don't see us switching away from it, > > given > > > > that it is such a core part of the API. > > > > > > > > The close() method has, to my best knowledge, not caused issues, > yet. I > > > > cannot recall anyone mentioning that the close() method confused > them, > > > they > > > > accidentally called it, etc. > > > > I am wondering whether this is more of a theoretical than practical > > > issue. > > > > > > > > If we move one function away from Collector to be on the safe side > for > > a > > > > "maybe change in the future", while keeping Collector in all other > > > > functions - I think that fragments the API concept wise more than i= t > > > > improves anything. > > > > > > > > On Sun, Jul 31, 2016 at 7:10 PM, Aljoscha Krettek < > aljoscha@apache.org > > > > > > > wrote: > > > > > > > > > @Stephan: For the Output, should we keep using a Collector (which > > > > exposes) > > > > > the close() method which should never be called by users or creat= e > a > > > new > > > > > Output type that only has an "output" method. Collector can also = be > > > used > > > > > but with a close() method that doesn't do anything. In the long > run, > > I > > > > > thought it might be better to switch the type away from Collector= . > > > > > > > > > > Cheers, > > > > > Aljoscha > > > > > > > > > > On Wed, 20 Jul 2016 at 01:25 Maximilian Michels > > > wrote: > > > > > > > > > > > I think it looks like Beam rather than Hadoop :) > > > > > > > > > > > > What Stephan meant was that he wanted a dedicated output method > in > > > the > > > > > > ProcessWindowFunction. I agree with Aljoscha that we shouldn't > > expose > > > > > > the collector. > > > > > > > > > > > > On Tue, Jul 19, 2016 at 10:45 PM, Aljoscha Krettek < > > > > aljoscha@apache.org> > > > > > > wrote: > > > > > > > You mean keep the Collector? I don't like that one because it > has > > > the > > > > > > > close() method that should never be called by the user. > > > > > > > > > > > > > > We can keep it, though, because all the other user function > > > > interfaces > > > > > > also > > > > > > > expose it to the user. > > > > > > > > > > > > > > On Tue, 19 Jul 2016 at 15:22 Stephan Ewen > > > wrote: > > > > > > > > > > > > > >> I would actually make the output a separate parameter as wel= l. > > > > Pretty > > > > > > much > > > > > > >> like the old variant, only replacing the "Window" parameter = by > > the > > > > > > context > > > > > > >> (which contains everything about the window). > > > > > > >> It could also be called "WindowInvocationContext" or so. > > > > > > >> > > > > > > >> The current variant looks too Hadoop to me ;-) Everything do= ne > > on > > > > the > > > > > > >> context object, and messy mocking when creating tests. > > > > > > >> > > > > > > >> On Mon, Jul 18, 2016 at 6:42 PM, Radu Tudoran < > > > > > radu.tudoran@huawei.com> > > > > > > >> wrote: > > > > > > >> > > > > > > >> > Hi, > > > > > > >> > > > > > > > >> > Sorry - I made a mistake - I was thinking of getting acces= s > to > > > the > > > > > > >> > collection (mist-read :) collector) of events in the windo= w > > > buffer > > > > > in > > > > > > >> > order to be able to delete/evict some of them which are no= t > > > > > necessary > > > > > > the > > > > > > >> > last ones. > > > > > > >> > > > > > > > >> > > > > > > > >> > Radu > > > > > > >> > > > > > > > >> > -----Original Message----- > > > > > > >> > From: Aljoscha Krettek [mailto:aljoscha@apache.org] > > > > > > >> > Sent: Monday, July 18, 2016 5:54 PM > > > > > > >> > To: dev@flink.apache.org > > > > > > >> > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function > > Metadata > > > > > > >> > > > > > > > >> > What about the collector? This is only used for emitting > > > elements > > > > to > > > > > > the > > > > > > >> > downstream operation. > > > > > > >> > > > > > > > >> > On Mon, 18 Jul 2016 at 17:52 Radu Tudoran < > > > > radu.tudoran@huawei.com> > > > > > > >> wrote: > > > > > > >> > > > > > > > >> > > Hi, > > > > > > >> > > > > > > > > >> > > I think it looks good and most importantly is that we ca= n > > > extend > > > > > it > > > > > > in > > > > > > >> > > the directions discussed so far. > > > > > > >> > > > > > > > > >> > > One question though regarding the Collector - are we goi= ng > > to > > > be > > > > > > able > > > > > > >> > > to delete random elements from the list if this is not > > exposed > > > > as > > > > > a > > > > > > >> > > collection, at least to the evictor? If not, how are we > > going > > > to > > > > > > >> > > extend in the future to cover this case? > > > > > > >> > > > > > > > > >> > > Regarding the ordering - I also observed that there are > > > > situations > > > > > > >> > > where elements do not have a logical order. One example = is > > if > > > > you > > > > > > have > > > > > > >> > > high rates of the events. Nevertheless, even if now is n= ot > > the > > > > > time > > > > > > >> > > for this, I think in the future we can imagine having al= so > > > some > > > > > data > > > > > > >> > > structures that offer some ordering. It can save some > > > > computation > > > > > > >> > > efforts later in the functions for some use cases. > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > Best regards, > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > -----Original Message----- > > > > > > >> > > From: Aljoscha Krettek [mailto:aljoscha@apache.org] > > > > > > >> > > Sent: Monday, July 18, 2016 3:45 PM > > > > > > >> > > To: dev@flink.apache.org > > > > > > >> > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function > > > Metadata > > > > > > >> > > > > > > > > >> > > I incorporated the changes. The proposed interface of > > > > > > >> > > ProcessWindowFunction is now this: > > > > > > >> > > > > > > > > >> > > public abstract class ProcessWindowFunction KEY, W > > > > > extends > > > > > > >> > > Window> implements Function { > > > > > > >> > > > > > > > > >> > > public abstract void process(KEY key, Iterable > > > elements, > > > > > > >> > > Context > > > > > > >> > > ctx) throws Exception; > > > > > > >> > > > > > > > > >> > > public abstract class Context { > > > > > > >> > > public abstract W window(); > > > > > > >> > > public abstract void output(OUT value); > > > > > > >> > > } > > > > > > >> > > } > > > > > > >> > > > > > > > > >> > > I'm proposing to not expose Collector anymore because it > has > > > the > > > > > > >> > > close() method that should not be called by users. Havin= g > > the > > > > > > output() > > > > > > >> > > call directly on the context should work just as well. > > > > > > >> > > > > > > > > >> > > Also, I marked the "adding a firing reason" and "adding > > firing > > > > > > >> > > counter" as future work that are only examples of stuff > that > > > can > > > > > be > > > > > > >> > > implemented on top of the new interface. Initially, this > > will > > > > > > provide > > > > > > >> > > exactly the same features as the old API but be > extensible. > > I > > > > did > > > > > > this > > > > > > >> > > to not make the scope of this proposal to big because Ra= du > > > also > > > > > > >> > > suggested more changes and each of them should be covere= d > > in a > > > > > > separate > > > > > > >> > design doc or FLIP. > > > > > > >> > > > > > > > > >> > > @Radu: On the different buffer types. I think this would > be > > > very > > > > > > >> tricky. > > > > > > >> > > Right now, people should also not rely on the fact that > > > elements > > > > > are > > > > > > >> > > "FIFO". Some state backends might keep the elements in a > > > > different > > > > > > >> > > order and when you have merging windows/session windows > the > > > > order > > > > > of > > > > > > >> > > the elements will also not be preserved. > > > > > > >> > > > > > > > > >> > > Cheers, > > > > > > >> > > Aljoscha > > > > > > >> > > > > > > > > >> > > On Wed, 13 Jul 2016 at 18:40 Radu Tudoran < > > > > > radu.tudoran@huawei.com> > > > > > > >> > wrote: > > > > > > >> > > > > > > > > >> > > > Hi, > > > > > > >> > > > > > > > > > >> > > > If it is to extend the Context to pass more informatio= n > > > > between > > > > > > the > > > > > > >> > > > stages of processing a window (triggering -> process -= > > > > > > eviction), > > > > > > >> > > > why not adding also a "EvictionInfo"? I think this mig= ht > > > > > actually > > > > > > >> > > > help with the issues discussed in the tread related to > the > > > > > > eviction > > > > > > >> > policy. > > > > > > >> > > > I could imagine using this parameter to pass the > > conditions, > > > > > from > > > > > > >> > > > the processing stage to the evictor, about what events > to > > be > > > > > > >> > eliminated. > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > public abstract class Context { > > > > > > >> > > > > > > > > > >> > > > public abstract EvictionInfo evictionInfo(); > > > > > > >> > > > > > > > > > >> > > > ... > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > public abstract KEY key(); > > > > > > >> > > > > > > > > > >> > > > public abstract W window(); > > > > > > >> > > > > > > > > > >> > > > public abstract int id(); > > > > > > >> > > > > > > > > > >> > > > public abstract FiringInfo firingInfo(); > > > > > > >> > > > > > > > > > >> > > > public abstract Iterable elements(); > > > > > > >> > > > > > > > > > >> > > > public abstract void output(OUT value); > > > > > > >> > > > > > > > > > >> > > > } > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > Also on a slightly unrelated issue - how hard it would > be > > to > > > > > > >> > > > introduce different types of buffers for the windows. > > > > Currently > > > > > > the > > > > > > >> > > > existing one is behaving (when under processing) simil= ar > > > with > > > > a > > > > > > FIFO > > > > > > >> > > > queue (in the sense that you need to start from > beginning, > > > > from > > > > > > the > > > > > > >> > oldest element). > > > > > > >> > > > How about enabling for example also LIFO behavior (sta= rt > > > > > iterating > > > > > > >> > > > through the list from the most recent element). As in > the > > > > source > > > > > > >> > > > queues or stacks are not actually used, perhaps we can > > just > > > > pass > > > > > > >> > > > policies to the iterator - or have custom itrators > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > Dr. Radu Tudoran > > > > > > >> > > > Research Engineer - Big Data Expert > > > > > > >> > > > IT R&D Division > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH European Research > > > Center > > > > > > >> > > > Riesstrasse 25, 80992 M=C3=BCnchen > > > > > > >> > > > > > > > > > >> > > > E-mail: radu.tudoran@huawei.com > > > > > > >> > > > Mobile: +49 15209084330 <+49%201520%209084330> > <01520%209084330> > > <+49%201520%209084330> > > > <01520%209084330> > > > > > > >> > > > Telephone: +49 891588344173 <+49%2089%201588344173> > <089%201588344173> > > <+49%2089%201588344173> > > > <089%201588344173> > > > > > > >> > > > > > > > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH Hansaallee 205, > 40549 > > > > > > >> > > > D=C3=BCsseldorf, Germany, www.huawei.com Registered > > > > > > >> > > > Office: D=C3=BCsseldorf, Register Court D=C3=BCsseldor= f, HRB > 56063, > > > > > Managing > > > > > > >> > > > Director: Bo PENG, Wanzhou MENG, Lifang CHEN Sitz der > > > > > > Gesellschaft: > > > > > > >> > > > D=C3=BCsseldorf, Amtsgericht D=C3=BCsseldorf, HRB 5606= 3, > > > > > > >> > > > Gesch=C3=A4ftsf=C3=BChrer: Bo PENG, Wanzhou MENG, Lifa= ng CHEN This > > > > e-mail > > > > > > and > > > > > > >> > > > its attachments contain confidential information from > > > HUAWEI, > > > > > > which > > > > > > >> > > > is intended only for the person or entity whose addres= s > is > > > > > listed > > > > > > >> > above. > > > > > > >> > > > Any use of the information contained herein in any way > > > > > (including, > > > > > > >> > > > but not limited to, total or partial disclosure, > > > reproduction, > > > > > or > > > > > > >> > > > dissemination) by persons other than the intended > > > recipient(s) > > > > > is > > > > > > >> > > > prohibited. If you receive this e-mail in error, pleas= e > > > notify > > > > > the > > > > > > >> > > > sender by phone or email immediately and delete it! > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > -----Original Message----- > > > > > > >> > > > From: Aljoscha Krettek [mailto:aljoscha@apache.org] > > > > > > >> > > > Sent: Wednesday, July 13, 2016 2:24 PM > > > > > > >> > > > To: dev@flink.apache.org > > > > > > >> > > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Functio= n > > > > Metadata > > > > > > >> > > > > > > > > > >> > > > Sure, I also thought about this but went for the > "extreme" > > > > > > initially. > > > > > > >> > > > If no-one objects I'll update the doc in a bit. > > > > > > >> > > > > > > > > > >> > > > On Wed, 13 Jul 2016 at 14:17 Stephan Ewen < > > sewen@apache.org > > > > > > > > > > wrote: > > > > > > >> > > > > > > > > > >> > > > > Thanks for opening this. > > > > > > >> > > > > > > > > > > >> > > > > I see the need for having an extensible context obje= ct > > for > > > > > > window > > > > > > >> > > > > function invocations, but i think hiding every > parameter > > > in > > > > > the > > > > > > >> > > > > context is a bit unnatural. > > > > > > >> > > > > > > > > > > >> > > > > How about having a function "apply(Key, Values, > > > > WindowContext, > > > > > > >> > > > Collector)" > > > > > > >> > > > > ? > > > > > > >> > > > > It should be possible to write the straightforward u= se > > > cases > > > > > > >> > > > > without accessing the context object. > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > On Wed, Jul 13, 2016 at 1:56 PM, Aljoscha Krettek > > > > > > >> > > > > > > > > > > >> > > > > wrote: > > > > > > >> > > > > > > > > > > >> > > > > > Hi, > > > > > > >> > > > > > this is a proposal to introduce a new interface fo= r > > the > > > > > window > > > > > > >> > > > > > function > > > > > > >> > > > > to > > > > > > >> > > > > > make it more extensible for the future where we > might > > > want > > > > > to > > > > > > >> > > > > > provide additional information about why a window > > fired > > > to > > > > > the > > > > > > >> > > > > > user > > > > > > >> > > > function: > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending > > > > > > >> > > > > +W > > > > > > >> > > > > in > > > > > > >> > > > > dow+Function+Metadata > > > > > > >> > > > > > > > > > > > >> > > > > > I'd appreciate your thoughts! > > > > > > >> > > > > > > > > > > > >> > > > > > Cheers, > > > > > > >> > > > > > Aljoscha > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > --001a113ec556e11d9f054058c826--