flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: [DISCUSS] FLIP-2 Extending Window Function Metadata
Date Wed, 02 Nov 2016 13:14:57 GMT
Hi Manu,
it's great that you want to work on this but another contributor has also
started looking into this and has some code already. It's unfortunate that
there is no Jira issue yet for this but I think he'll open one this week.

Cheers,
Aljoscha

On Wed, 2 Nov 2016 at 13:00 Manu Zhang <owenzhang1990@gmail.com> wrote:

> Hi Aljoscha,
>
> Have you started working on ProcessWindowFunction ? If not, may I take this
> task ?
>
> Thanks,
> Manu
>
>
> On Wed, Nov 2, 2016 at 5:16 PM Aljoscha Krettek <aljoscha@apache.org>
> wrote:
>
> > I think we reached consensus here so I would like to mark this FLIP as
> > accepted. We will now process with implementing the first step, i.e.
> adding
> > the new ProcessWindowFunction.
> >
> > On Mon, 1 Aug 2016 at 18:08 Aljoscha Krettek <aljoscha@apache.org>
> wrote:
> >
> > > Alright, that seems reasonable. I updated the doc to add the Collector
> to
> > > the method signature again.
> > >
> > > On Mon, 1 Aug 2016 at 00:59 Stephan Ewen <sewen@apache.org> wrote:
> > >
> > > The Collector is pretty integral in all the other functions that return
> > > multiple elements. I honestly don't see us switching away from it,
> given
> > > that it is such a core part of the API.
> > >
> > > The close() method has, to my best knowledge, not caused issues, yet. I
> > > cannot recall anyone mentioning that the close() method confused them,
> > they
> > > accidentally called it, etc.
> > > I am wondering whether this is more of a theoretical than practical
> > issue.
> > >
> > > If we move one function away from Collector to be on the safe side for
> a
> > > "maybe change in the future", while keeping Collector in all other
> > > functions - I think that fragments the API concept wise more than it
> > > improves anything.
> > >
> > > On Sun, Jul 31, 2016 at 7:10 PM, Aljoscha Krettek <aljoscha@apache.org
> >
> > > wrote:
> > >
> > > > @Stephan: For the Output, should we keep using a Collector (which
> > > exposes)
> > > > the close() method which should never be called by users or create a
> > new
> > > > Output type that only has an "output" method. Collector can also be
> > used
> > > > but with a close() method that doesn't do anything. In the long run,
> I
> > > > thought it might be better to switch the type away from Collector.
> > > >
> > > > Cheers,
> > > > Aljoscha
> > > >
> > > > On Wed, 20 Jul 2016 at 01:25 Maximilian Michels <mxm@apache.org>
> > wrote:
> > > >
> > > > > I think it looks like Beam rather than Hadoop :)
> > > > >
> > > > > What Stephan meant was that he wanted a dedicated output method in
> > the
> > > > > ProcessWindowFunction. I agree with Aljoscha that we shouldn't
> expose
> > > > > the collector.
> > > > >
> > > > > On Tue, Jul 19, 2016 at 10:45 PM, Aljoscha Krettek <
> > > aljoscha@apache.org>
> > > > > wrote:
> > > > > > You mean keep the Collector? I don't like that one because it
has
> > the
> > > > > > close() method that should never be called by the user.
> > > > > >
> > > > > > We can keep it, though, because all the other user function
> > > interfaces
> > > > > also
> > > > > > expose it to the user.
> > > > > >
> > > > > > On Tue, 19 Jul 2016 at 15:22 Stephan Ewen <sewen@apache.org>
> > wrote:
> > > > > >
> > > > > >> I would actually make the output a separate parameter as
well.
> > > Pretty
> > > > > much
> > > > > >> like the old variant, only replacing the "Window" parameter
by
> the
> > > > > context
> > > > > >> (which contains everything about the window).
> > > > > >> It could also be called "WindowInvocationContext" or so.
> > > > > >>
> > > > > >> The current variant looks too Hadoop to me ;-) Everything
done
> on
> > > the
> > > > > >> context object, and messy mocking when creating tests.
> > > > > >>
> > > > > >> On Mon, Jul 18, 2016 at 6:42 PM, Radu Tudoran <
> > > > radu.tudoran@huawei.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi,
> > > > > >> >
> > > > > >> > Sorry - I made a mistake - I was thinking of getting
access to
> > the
> > > > > >> > collection (mist-read :) collector) of events in the
window
> > buffer
> > > > in
> > > > > >> > order to be able to delete/evict some of them which
are not
> > > > necessary
> > > > > the
> > > > > >> > last ones.
> > > > > >> >
> > > > > >> >
> > > > > >> > Radu
> > > > > >> >
> > > > > >> > -----Original Message-----
> > > > > >> > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > > >> > Sent: Monday, July 18, 2016 5:54 PM
> > > > > >> > To: dev@flink.apache.org
> > > > > >> > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function
> Metadata
> > > > > >> >
> > > > > >> > What about the collector? This is only used for emitting
> > elements
> > > to
> > > > > the
> > > > > >> > downstream operation.
> > > > > >> >
> > > > > >> > On Mon, 18 Jul 2016 at 17:52 Radu Tudoran <
> > > radu.tudoran@huawei.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Hi,
> > > > > >> > >
> > > > > >> > > I think it looks good and most importantly is
that we can
> > extend
> > > > it
> > > > > in
> > > > > >> > > the directions discussed so far.
> > > > > >> > >
> > > > > >> > > One question though regarding the Collector -
are we going
> to
> > be
> > > > > able
> > > > > >> > > to delete random elements from the list if this
is not
> exposed
> > > as
> > > > a
> > > > > >> > > collection, at least to the evictor? If not, how
are we
> going
> > to
> > > > > >> > > extend in the future to cover this case?
> > > > > >> > >
> > > > > >> > > Regarding the ordering - I also observed that
there are
> > > situations
> > > > > >> > > where elements do not have a logical order. One
example is
> if
> > > you
> > > > > have
> > > > > >> > > high rates of the events. Nevertheless, even if
now is not
> the
> > > > time
> > > > > >> > > for this, I think in the future we can imagine
having also
> > some
> > > > data
> > > > > >> > > structures that offer some ordering. It can save
some
> > > computation
> > > > > >> > > efforts later in the functions for some use cases.
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > Best regards,
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > -----Original Message-----
> > > > > >> > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > > >> > > Sent: Monday, July 18, 2016 3:45 PM
> > > > > >> > > To: dev@flink.apache.org
> > > > > >> > > Subject: Re: [DISCUSS] FLIP-2 Extending Window
Function
> > Metadata
> > > > > >> > >
> > > > > >> > > I incorporated the changes. The proposed interface
of
> > > > > >> > > ProcessWindowFunction is now this:
> > > > > >> > >
> > > > > >> > > public abstract class ProcessWindowFunction <IN,
OUT, KEY, W
> > > > extends
> > > > > >> > > Window> implements Function {
> > > > > >> > >
> > > > > >> > >     public abstract void process(KEY key, Iterable<IN>
> > elements,
> > > > > >> > > Context
> > > > > >> > > ctx) throws Exception;
> > > > > >> > >
> > > > > >> > >     public abstract class Context {
> > > > > >> > >         public abstract W window();
> > > > > >> > >         public abstract void output(OUT value);
> > > > > >> > >     }
> > > > > >> > > }
> > > > > >> > >
> > > > > >> > > I'm proposing to not expose Collector anymore
because it has
> > the
> > > > > >> > > close() method that should not be called by users.
Having
> the
> > > > > output()
> > > > > >> > > call directly on the context should work just
as well.
> > > > > >> > >
> > > > > >> > > Also, I marked the "adding a firing reason" and
"adding
> firing
> > > > > >> > > counter" as future work that are only examples
of stuff that
> > can
> > > > be
> > > > > >> > > implemented on top of the new interface. Initially,
this
> will
> > > > > provide
> > > > > >> > > exactly the same features as the old API but be
extensible.
> I
> > > did
> > > > > this
> > > > > >> > > to not make the scope of this proposal to big
because Radu
> > also
> > > > > >> > > suggested more changes and each of them should
be covered
> in a
> > > > > separate
> > > > > >> > design doc or FLIP.
> > > > > >> > >
> > > > > >> > > @Radu: On the different buffer types. I think
this would be
> > very
> > > > > >> tricky.
> > > > > >> > > Right now, people should also not rely on the
fact that
> > elements
> > > > are
> > > > > >> > > "FIFO". Some state backends might keep the elements
in a
> > > different
> > > > > >> > > order and when you have merging windows/session
windows the
> > > order
> > > > of
> > > > > >> > > the elements will also not be preserved.
> > > > > >> > >
> > > > > >> > > Cheers,
> > > > > >> > > Aljoscha
> > > > > >> > >
> > > > > >> > > On Wed, 13 Jul 2016 at 18:40 Radu Tudoran <
> > > > radu.tudoran@huawei.com>
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > Hi,
> > > > > >> > > >
> > > > > >> > > > If it is to extend the Context to pass more
information
> > > between
> > > > > the
> > > > > >> > > > stages of processing a window (triggering
-> process ->
> > > > eviction),
> > > > > >> > > > why not adding also a "EvictionInfo"? I think
this might
> > > > actually
> > > > > >> > > > help with the issues discussed in the tread
related to the
> > > > > eviction
> > > > > >> > policy.
> > > > > >> > > > I could imagine using this parameter to pass
the
> conditions,
> > > > from
> > > > > >> > > > the processing stage to the evictor, about
what events to
> be
> > > > > >> > eliminated.
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > public abstract class Context {
> > > > > >> > > >
> > > > > >> > > >    public abstract EvictionInfo evictionInfo();
> > > > > >> > > >
> > > > > >> > > > ...
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >    public abstract KEY key();
> > > > > >> > > >
> > > > > >> > > >    public abstract W window();
> > > > > >> > > >
> > > > > >> > > >    public abstract int id();
> > > > > >> > > >
> > > > > >> > > >    public abstract FiringInfo firingInfo();
> > > > > >> > > >
> > > > > >> > > >    public abstract Iterable<IN> elements();
> > > > > >> > > >
> > > > > >> > > >    public abstract void output(OUT value);
> > > > > >> > > >
> > > > > >> > > > }
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > Also on a slightly unrelated issue - how
hard it would be
> to
> > > > > >> > > > introduce different types of buffers for
the windows.
> > > Currently
> > > > > the
> > > > > >> > > > existing one is behaving (when under processing)
similar
> > with
> > > a
> > > > > FIFO
> > > > > >> > > > queue (in the sense that you need to start
from beginning,
> > > from
> > > > > the
> > > > > >> > oldest element).
> > > > > >> > > > How about enabling for example also LIFO
behavior (start
> > > > iterating
> > > > > >> > > > through the list from the most recent element).
As in the
> > > source
> > > > > >> > > > queues or stacks are not actually used, perhaps
we can
> just
> > > pass
> > > > > >> > > > policies to the iterator - or have custom
itrators
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > Dr. Radu Tudoran
> > > > > >> > > > Research Engineer - Big Data Expert
> > > > > >> > > > IT R&D Division
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH European
Research
> > Center
> > > > > >> > > > Riesstrasse 25, 80992 München
> > > > > >> > > >
> > > > > >> > > > E-mail: radu.tudoran@huawei.com
> > > > > >> > > > Mobile: +49 15209084330 <01520%209084330>
> <+49%201520%209084330>
> > <01520%209084330>
> > > > > >> > > > Telephone: +49 891588344173 <089%201588344173>
> <+49%2089%201588344173>
> > <089%201588344173>
> > > > > >> > > >
> > > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH Hansaallee
205, 40549
> > > > > >> > > > Düsseldorf, Germany, www.huawei.com Registered
> > > > > >> > > > Office: Düsseldorf, Register Court Düsseldorf,
HRB 56063,
> > > > Managing
> > > > > >> > > > Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der
> > > > > Gesellschaft:
> > > > > >> > > > Düsseldorf, Amtsgericht Düsseldorf, HRB
56063,
> > > > > >> > > > Geschäftsführer: Bo PENG, Wanzhou MENG,
Lifang CHEN This
> > > e-mail
> > > > > and
> > > > > >> > > > its attachments contain confidential information
from
> > HUAWEI,
> > > > > which
> > > > > >> > > > is intended only for the person or entity
whose address is
> > > > listed
> > > > > >> > above.
> > > > > >> > > > Any use of the information contained herein
in any way
> > > > (including,
> > > > > >> > > > but not limited to, total or partial disclosure,
> > reproduction,
> > > > or
> > > > > >> > > > dissemination) by persons other than the
intended
> > recipient(s)
> > > > is
> > > > > >> > > > prohibited. If you receive this e-mail in
error, please
> > notify
> > > > the
> > > > > >> > > > sender by phone or email immediately and
delete it!
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > -----Original Message-----
> > > > > >> > > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > > >> > > > Sent: Wednesday, July 13, 2016 2:24 PM
> > > > > >> > > > To: dev@flink.apache.org
> > > > > >> > > > Subject: Re: [DISCUSS] FLIP-2 Extending Window
Function
> > > Metadata
> > > > > >> > > >
> > > > > >> > > > Sure, I also thought about this but went
for the "extreme"
> > > > > initially.
> > > > > >> > > > If no-one objects I'll update the doc in
a bit.
> > > > > >> > > >
> > > > > >> > > > On Wed, 13 Jul 2016 at 14:17 Stephan Ewen
<
> sewen@apache.org
> > >
> > > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Thanks for opening this.
> > > > > >> > > > >
> > > > > >> > > > > I see the need for having an extensible
context object
> for
> > > > > window
> > > > > >> > > > > function invocations, but i think hiding
every parameter
> > in
> > > > the
> > > > > >> > > > > context is a bit unnatural.
> > > > > >> > > > >
> > > > > >> > > > > How about having a function "apply(Key,
Values,
> > > WindowContext,
> > > > > >> > > > Collector)"
> > > > > >> > > > > ?
> > > > > >> > > > > It should be possible to write the straightforward
use
> > cases
> > > > > >> > > > > without accessing the context object.
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Wed, Jul 13, 2016 at 1:56 PM, Aljoscha
Krettek
> > > > > >> > > > > <aljoscha@apache.org>
> > > > > >> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hi,
> > > > > >> > > > > > this is a proposal to introduce
a new interface for
> the
> > > > window
> > > > > >> > > > > > function
> > > > > >> > > > > to
> > > > > >> > > > > > make it more extensible for the
future where we might
> > want
> > > > to
> > > > > >> > > > > > provide additional information
about why a window
> fired
> > to
> > > > the
> > > > > >> > > > > > user
> > > > > >> > > > function:
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending
> > > > > >> > > > > +W
> > > > > >> > > > > in
> > > > > >> > > > > dow+Function+Metadata
> > > > > >> > > > > >
> > > > > >> > > > > > I'd appreciate your thoughts!
> > > > > >> > > > > >
> > > > > >> > > > > > Cheers,
> > > > > >> > > > > > Aljoscha
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message