flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: [DISCUSS] FLIP-2 Extending Window Function Metadata
Date Mon, 01 Aug 2016 07:59:19 GMT
The Collector is pretty integral in all the other functions that return
multiple elements. I honestly don't see us switching away from it, given
that it is such a core part of the API.

The close() method has, to my best knowledge, not caused issues, yet. I
cannot recall anyone mentioning that the close() method confused them, they
accidentally called it, etc.
I am wondering whether this is more of a theoretical than practical issue.

If we move one function away from Collector to be on the safe side for a
"maybe change in the future", while keeping Collector in all other
functions - I think that fragments the API concept wise more than it
improves anything.

On Sun, Jul 31, 2016 at 7:10 PM, Aljoscha Krettek <aljoscha@apache.org>
wrote:

> @Stephan: For the Output, should we keep using a Collector (which exposes)
> the close() method which should never be called by users or create a new
> Output type that only has an "output" method. Collector can also be used
> but with a close() method that doesn't do anything. In the long run, I
> thought it might be better to switch the type away from Collector.
>
> Cheers,
> Aljoscha
>
> On Wed, 20 Jul 2016 at 01:25 Maximilian Michels <mxm@apache.org> wrote:
>
> > I think it looks like Beam rather than Hadoop :)
> >
> > What Stephan meant was that he wanted a dedicated output method in the
> > ProcessWindowFunction. I agree with Aljoscha that we shouldn't expose
> > the collector.
> >
> > On Tue, Jul 19, 2016 at 10:45 PM, Aljoscha Krettek <aljoscha@apache.org>
> > wrote:
> > > You mean keep the Collector? I don't like that one because it has the
> > > close() method that should never be called by the user.
> > >
> > > We can keep it, though, because all the other user function interfaces
> > also
> > > expose it to the user.
> > >
> > > On Tue, 19 Jul 2016 at 15:22 Stephan Ewen <sewen@apache.org> wrote:
> > >
> > >> I would actually make the output a separate parameter as well. Pretty
> > much
> > >> like the old variant, only replacing the "Window" parameter by the
> > context
> > >> (which contains everything about the window).
> > >> It could also be called "WindowInvocationContext" or so.
> > >>
> > >> The current variant looks too Hadoop to me ;-) Everything done on the
> > >> context object, and messy mocking when creating tests.
> > >>
> > >> On Mon, Jul 18, 2016 at 6:42 PM, Radu Tudoran <
> radu.tudoran@huawei.com>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > Sorry - I made a mistake - I was thinking of getting access to the
> > >> > collection (mist-read :) collector) of events in the window buffer
> in
> > >> > order to be able to delete/evict some of them which are not
> necessary
> > the
> > >> > last ones.
> > >> >
> > >> >
> > >> > Radu
> > >> >
> > >> > -----Original Message-----
> > >> > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > >> > Sent: Monday, July 18, 2016 5:54 PM
> > >> > To: dev@flink.apache.org
> > >> > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function Metadata
> > >> >
> > >> > What about the collector? This is only used for emitting elements
to
> > the
> > >> > downstream operation.
> > >> >
> > >> > On Mon, 18 Jul 2016 at 17:52 Radu Tudoran <radu.tudoran@huawei.com>
> > >> wrote:
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > I think it looks good and most importantly is that we can extend
> it
> > in
> > >> > > the directions discussed so far.
> > >> > >
> > >> > > One question though regarding the Collector - are we going to
be
> > able
> > >> > > to delete random elements from the list if this is not exposed
as
> a
> > >> > > collection, at least to the evictor? If not, how are we going
to
> > >> > > extend in the future to cover this case?
> > >> > >
> > >> > > Regarding the ordering - I also observed that there are situations
> > >> > > where elements do not have a logical order. One example is if
you
> > have
> > >> > > high rates of the events. Nevertheless, even if now is not the
> time
> > >> > > for this, I think in the future we can imagine having also some
> data
> > >> > > structures that offer some ordering. It can save some computation
> > >> > > efforts later in the functions for some use cases.
> > >> > >
> > >> > >
> > >> > > Best regards,
> > >> > >
> > >> > >
> > >> > > -----Original Message-----
> > >> > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > >> > > Sent: Monday, July 18, 2016 3:45 PM
> > >> > > To: dev@flink.apache.org
> > >> > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function Metadata
> > >> > >
> > >> > > I incorporated the changes. The proposed interface of
> > >> > > ProcessWindowFunction is now this:
> > >> > >
> > >> > > public abstract class ProcessWindowFunction <IN, OUT, KEY,
W
> extends
> > >> > > Window> implements Function {
> > >> > >
> > >> > >     public abstract void process(KEY key, Iterable<IN>
elements,
> > >> > > Context
> > >> > > ctx) throws Exception;
> > >> > >
> > >> > >     public abstract class Context {
> > >> > >         public abstract W window();
> > >> > >         public abstract void output(OUT value);
> > >> > >     }
> > >> > > }
> > >> > >
> > >> > > I'm proposing to not expose Collector anymore because it has
the
> > >> > > close() method that should not be called by users. Having the
> > output()
> > >> > > call directly on the context should work just as well.
> > >> > >
> > >> > > Also, I marked the "adding a firing reason" and "adding firing
> > >> > > counter" as future work that are only examples of stuff that
can
> be
> > >> > > implemented on top of the new interface. Initially, this will
> > provide
> > >> > > exactly the same features as the old API but be extensible. I
did
> > this
> > >> > > to not make the scope of this proposal to big because Radu also
> > >> > > suggested more changes and each of them should be covered in
a
> > separate
> > >> > design doc or FLIP.
> > >> > >
> > >> > > @Radu: On the different buffer types. I think this would be very
> > >> tricky.
> > >> > > Right now, people should also not rely on the fact that elements
> are
> > >> > > "FIFO". Some state backends might keep the elements in a different
> > >> > > order and when you have merging windows/session windows the order
> of
> > >> > > the elements will also not be preserved.
> > >> > >
> > >> > > Cheers,
> > >> > > Aljoscha
> > >> > >
> > >> > > On Wed, 13 Jul 2016 at 18:40 Radu Tudoran <
> radu.tudoran@huawei.com>
> > >> > wrote:
> > >> > >
> > >> > > > Hi,
> > >> > > >
> > >> > > > If it is to extend the Context to pass more information
between
> > the
> > >> > > > stages of processing a window (triggering -> process
->
> eviction),
> > >> > > > why not adding also a "EvictionInfo"? I think this might
> actually
> > >> > > > help with the issues discussed in the tread related to the
> > eviction
> > >> > policy.
> > >> > > > I could imagine using this parameter to pass the conditions,
> from
> > >> > > > the processing stage to the evictor, about what events to
be
> > >> > eliminated.
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > public abstract class Context {
> > >> > > >
> > >> > > >    public abstract EvictionInfo evictionInfo();
> > >> > > >
> > >> > > > ...
> > >> > > >
> > >> > > >
> > >> > > >    public abstract KEY key();
> > >> > > >
> > >> > > >    public abstract W window();
> > >> > > >
> > >> > > >    public abstract int id();
> > >> > > >
> > >> > > >    public abstract FiringInfo firingInfo();
> > >> > > >
> > >> > > >    public abstract Iterable<IN> elements();
> > >> > > >
> > >> > > >    public abstract void output(OUT value);
> > >> > > >
> > >> > > > }
> > >> > > >
> > >> > > >
> > >> > > > Also on a slightly unrelated issue - how hard it would be
to
> > >> > > > introduce different types of buffers for the windows. Currently
> > the
> > >> > > > existing one is behaving (when under processing) similar
with a
> > FIFO
> > >> > > > queue (in the sense that you need to start from beginning,
from
> > the
> > >> > oldest element).
> > >> > > > How about enabling for example also LIFO behavior (start
> iterating
> > >> > > > through the list from the most recent element). As in the
source
> > >> > > > queues or stacks are not actually used, perhaps we can just
pass
> > >> > > > policies to the iterator - or have custom itrators
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > Dr. Radu Tudoran
> > >> > > > Research Engineer - Big Data Expert
> > >> > > > IT R&D Division
> > >> > > >
> > >> > > >
> > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH European Research Center
> > >> > > > Riesstrasse 25, 80992 München
> > >> > > >
> > >> > > > E-mail: radu.tudoran@huawei.com
> > >> > > > Mobile: +49 15209084330
> > >> > > > Telephone: +49 891588344173
> > >> > > >
> > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH Hansaallee 205, 40549
> > >> > > > Düsseldorf, Germany, www.huawei.com Registered
> > >> > > > Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
> Managing
> > >> > > > Director: Bo PENG, Wanzhou MENG, Lifang CHEN Sitz der
> > Gesellschaft:
> > >> > > > Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
> > >> > > > Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN This
e-mail
> > and
> > >> > > > its attachments contain confidential information from HUAWEI,
> > which
> > >> > > > is intended only for the person or entity whose address
is
> listed
> > >> > above.
> > >> > > > Any use of the information contained herein in any way
> (including,
> > >> > > > but not limited to, total or partial disclosure, reproduction,
> or
> > >> > > > dissemination) by persons other than the intended recipient(s)
> is
> > >> > > > prohibited. If you receive this e-mail in error, please
notify
> the
> > >> > > > sender by phone or email immediately and delete it!
> > >> > > >
> > >> > > >
> > >> > > > -----Original Message-----
> > >> > > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > >> > > > Sent: Wednesday, July 13, 2016 2:24 PM
> > >> > > > To: dev@flink.apache.org
> > >> > > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function
Metadata
> > >> > > >
> > >> > > > Sure, I also thought about this but went for the "extreme"
> > initially.
> > >> > > > If no-one objects I'll update the doc in a bit.
> > >> > > >
> > >> > > > On Wed, 13 Jul 2016 at 14:17 Stephan Ewen <sewen@apache.org>
> > wrote:
> > >> > > >
> > >> > > > > Thanks for opening this.
> > >> > > > >
> > >> > > > > I see the need for having an extensible context object
for
> > window
> > >> > > > > function invocations, but i think hiding every parameter
in
> the
> > >> > > > > context is a bit unnatural.
> > >> > > > >
> > >> > > > > How about having a function "apply(Key, Values, WindowContext,
> > >> > > > Collector)"
> > >> > > > > ?
> > >> > > > > It should be possible to write the straightforward
use cases
> > >> > > > > without accessing the context object.
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On Wed, Jul 13, 2016 at 1:56 PM, Aljoscha Krettek
> > >> > > > > <aljoscha@apache.org>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hi,
> > >> > > > > > this is a proposal to introduce a new interface
for the
> window
> > >> > > > > > function
> > >> > > > > to
> > >> > > > > > make it more extensible for the future where we
might want
> to
> > >> > > > > > provide additional information about why a window
fired to
> the
> > >> > > > > > user
> > >> > > > function:
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending
> > >> > > > > +W
> > >> > > > > in
> > >> > > > > dow+Function+Metadata
> > >> > > > > >
> > >> > > > > > I'd appreciate your thoughts!
> > >> > > > > >
> > >> > > > > > Cheers,
> > >> > > > > > Aljoscha
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message