flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manu Zhang <owenzhang1...@gmail.com>
Subject Re: [DISCUSS] FLIP-2 Extending Window Function Metadata
Date Wed, 02 Nov 2016 22:15:44 GMT
No problem, Aljoscha. I'll follow up with my use cases.

Thanks,
Manu

On Wed, Nov 2, 2016 at 9:15 PM Aljoscha Krettek <aljoscha@apache.org> wrote:

> Hi Manu,
> it's great that you want to work on this but another contributor has also
> started looking into this and has some code already. It's unfortunate that
> there is no Jira issue yet for this but I think he'll open one this week.
>
> Cheers,
> Aljoscha
>
> On Wed, 2 Nov 2016 at 13:00 Manu Zhang <owenzhang1990@gmail.com> wrote:
>
> > Hi Aljoscha,
> >
> > Have you started working on ProcessWindowFunction ? If not, may I take
> this
> > task ?
> >
> > Thanks,
> > Manu
> >
> >
> > On Wed, Nov 2, 2016 at 5:16 PM Aljoscha Krettek <aljoscha@apache.org>
> > wrote:
> >
> > > I think we reached consensus here so I would like to mark this FLIP as
> > > accepted. We will now process with implementing the first step, i.e.
> > adding
> > > the new ProcessWindowFunction.
> > >
> > > On Mon, 1 Aug 2016 at 18:08 Aljoscha Krettek <aljoscha@apache.org>
> > wrote:
> > >
> > > > Alright, that seems reasonable. I updated the doc to add the
> Collector
> > to
> > > > the method signature again.
> > > >
> > > > On Mon, 1 Aug 2016 at 00:59 Stephan Ewen <sewen@apache.org> wrote:
> > > >
> > > > The Collector is pretty integral in all the other functions that
> return
> > > > multiple elements. I honestly don't see us switching away from it,
> > given
> > > > that it is such a core part of the API.
> > > >
> > > > The close() method has, to my best knowledge, not caused issues,
> yet. I
> > > > cannot recall anyone mentioning that the close() method confused
> them,
> > > they
> > > > accidentally called it, etc.
> > > > I am wondering whether this is more of a theoretical than practical
> > > issue.
> > > >
> > > > If we move one function away from Collector to be on the safe side
> for
> > a
> > > > "maybe change in the future", while keeping Collector in all other
> > > > functions - I think that fragments the API concept wise more than it
> > > > improves anything.
> > > >
> > > > On Sun, Jul 31, 2016 at 7:10 PM, Aljoscha Krettek <
> aljoscha@apache.org
> > >
> > > > wrote:
> > > >
> > > > > @Stephan: For the Output, should we keep using a Collector (which
> > > > exposes)
> > > > > the close() method which should never be called by users or create
> a
> > > new
> > > > > Output type that only has an "output" method. Collector can also
be
> > > used
> > > > > but with a close() method that doesn't do anything. In the long
> run,
> > I
> > > > > thought it might be better to switch the type away from Collector.
> > > > >
> > > > > Cheers,
> > > > > Aljoscha
> > > > >
> > > > > On Wed, 20 Jul 2016 at 01:25 Maximilian Michels <mxm@apache.org>
> > > wrote:
> > > > >
> > > > > > I think it looks like Beam rather than Hadoop :)
> > > > > >
> > > > > > What Stephan meant was that he wanted a dedicated output method
> in
> > > the
> > > > > > ProcessWindowFunction. I agree with Aljoscha that we shouldn't
> > expose
> > > > > > the collector.
> > > > > >
> > > > > > On Tue, Jul 19, 2016 at 10:45 PM, Aljoscha Krettek <
> > > > aljoscha@apache.org>
> > > > > > wrote:
> > > > > > > You mean keep the Collector? I don't like that one because
it
> has
> > > the
> > > > > > > close() method that should never be called by the user.
> > > > > > >
> > > > > > > We can keep it, though, because all the other user function
> > > > interfaces
> > > > > > also
> > > > > > > expose it to the user.
> > > > > > >
> > > > > > > On Tue, 19 Jul 2016 at 15:22 Stephan Ewen <sewen@apache.org>
> > > wrote:
> > > > > > >
> > > > > > >> I would actually make the output a separate parameter
as well.
> > > > Pretty
> > > > > > much
> > > > > > >> like the old variant, only replacing the "Window" parameter
by
> > the
> > > > > > context
> > > > > > >> (which contains everything about the window).
> > > > > > >> It could also be called "WindowInvocationContext" or
so.
> > > > > > >>
> > > > > > >> The current variant looks too Hadoop to me ;-) Everything
done
> > on
> > > > the
> > > > > > >> context object, and messy mocking when creating tests.
> > > > > > >>
> > > > > > >> On Mon, Jul 18, 2016 at 6:42 PM, Radu Tudoran <
> > > > > radu.tudoran@huawei.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hi,
> > > > > > >> >
> > > > > > >> > Sorry - I made a mistake - I was thinking of getting
access
> to
> > > the
> > > > > > >> > collection (mist-read :) collector) of events
in the window
> > > buffer
> > > > > in
> > > > > > >> > order to be able to delete/evict some of them
which are not
> > > > > necessary
> > > > > > the
> > > > > > >> > last ones.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Radu
> > > > > > >> >
> > > > > > >> > -----Original Message-----
> > > > > > >> > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > > > >> > Sent: Monday, July 18, 2016 5:54 PM
> > > > > > >> > To: dev@flink.apache.org
> > > > > > >> > Subject: Re: [DISCUSS] FLIP-2 Extending Window
Function
> > Metadata
> > > > > > >> >
> > > > > > >> > What about the collector? This is only used for
emitting
> > > elements
> > > > to
> > > > > > the
> > > > > > >> > downstream operation.
> > > > > > >> >
> > > > > > >> > On Mon, 18 Jul 2016 at 17:52 Radu Tudoran <
> > > > radu.tudoran@huawei.com>
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > > Hi,
> > > > > > >> > >
> > > > > > >> > > I think it looks good and most importantly
is that we can
> > > extend
> > > > > it
> > > > > > in
> > > > > > >> > > the directions discussed so far.
> > > > > > >> > >
> > > > > > >> > > One question though regarding the Collector
- are we going
> > to
> > > be
> > > > > > able
> > > > > > >> > > to delete random elements from the list if
this is not
> > exposed
> > > > as
> > > > > a
> > > > > > >> > > collection, at least to the evictor? If not,
how are we
> > going
> > > to
> > > > > > >> > > extend in the future to cover this case?
> > > > > > >> > >
> > > > > > >> > > Regarding the ordering - I also observed
that there are
> > > > situations
> > > > > > >> > > where elements do not have a logical order.
One example is
> > if
> > > > you
> > > > > > have
> > > > > > >> > > high rates of the events. Nevertheless, even
if now is not
> > the
> > > > > time
> > > > > > >> > > for this, I think in the future we can imagine
having also
> > > some
> > > > > data
> > > > > > >> > > structures that offer some ordering. It can
save some
> > > > computation
> > > > > > >> > > efforts later in the functions for some use
cases.
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > Best regards,
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > -----Original Message-----
> > > > > > >> > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > > > >> > > Sent: Monday, July 18, 2016 3:45 PM
> > > > > > >> > > To: dev@flink.apache.org
> > > > > > >> > > Subject: Re: [DISCUSS] FLIP-2 Extending Window
Function
> > > Metadata
> > > > > > >> > >
> > > > > > >> > > I incorporated the changes. The proposed
interface of
> > > > > > >> > > ProcessWindowFunction is now this:
> > > > > > >> > >
> > > > > > >> > > public abstract class ProcessWindowFunction
<IN, OUT,
> KEY, W
> > > > > extends
> > > > > > >> > > Window> implements Function {
> > > > > > >> > >
> > > > > > >> > >     public abstract void process(KEY key,
Iterable<IN>
> > > elements,
> > > > > > >> > > Context
> > > > > > >> > > ctx) throws Exception;
> > > > > > >> > >
> > > > > > >> > >     public abstract class Context {
> > > > > > >> > >         public abstract W window();
> > > > > > >> > >         public abstract void output(OUT value);
> > > > > > >> > >     }
> > > > > > >> > > }
> > > > > > >> > >
> > > > > > >> > > I'm proposing to not expose Collector anymore
because it
> has
> > > the
> > > > > > >> > > close() method that should not be called
by users. Having
> > the
> > > > > > output()
> > > > > > >> > > call directly on the context should work
just as well.
> > > > > > >> > >
> > > > > > >> > > Also, I marked the "adding a firing reason"
and "adding
> > firing
> > > > > > >> > > counter" as future work that are only examples
of stuff
> that
> > > can
> > > > > be
> > > > > > >> > > implemented on top of the new interface.
Initially, this
> > will
> > > > > > provide
> > > > > > >> > > exactly the same features as the old API
but be
> extensible.
> > I
> > > > did
> > > > > > this
> > > > > > >> > > to not make the scope of this proposal to
big because Radu
> > > also
> > > > > > >> > > suggested more changes and each of them should
be covered
> > in a
> > > > > > separate
> > > > > > >> > design doc or FLIP.
> > > > > > >> > >
> > > > > > >> > > @Radu: On the different buffer types. I think
this would
> be
> > > very
> > > > > > >> tricky.
> > > > > > >> > > Right now, people should also not rely on
the fact that
> > > elements
> > > > > are
> > > > > > >> > > "FIFO". Some state backends might keep the
elements in a
> > > > different
> > > > > > >> > > order and when you have merging windows/session
windows
> the
> > > > order
> > > > > of
> > > > > > >> > > the elements will also not be preserved.
> > > > > > >> > >
> > > > > > >> > > Cheers,
> > > > > > >> > > Aljoscha
> > > > > > >> > >
> > > > > > >> > > On Wed, 13 Jul 2016 at 18:40 Radu Tudoran
<
> > > > > radu.tudoran@huawei.com>
> > > > > > >> > wrote:
> > > > > > >> > >
> > > > > > >> > > > Hi,
> > > > > > >> > > >
> > > > > > >> > > > If it is to extend the Context to pass
more information
> > > > between
> > > > > > the
> > > > > > >> > > > stages of processing a window (triggering
-> process ->
> > > > > eviction),
> > > > > > >> > > > why not adding also a "EvictionInfo"?
I think this might
> > > > > actually
> > > > > > >> > > > help with the issues discussed in the
tread related to
> the
> > > > > > eviction
> > > > > > >> > policy.
> > > > > > >> > > > I could imagine using this parameter
to pass the
> > conditions,
> > > > > from
> > > > > > >> > > > the processing stage to the evictor,
about what events
> to
> > be
> > > > > > >> > eliminated.
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > public abstract class Context {
> > > > > > >> > > >
> > > > > > >> > > >    public abstract EvictionInfo evictionInfo();
> > > > > > >> > > >
> > > > > > >> > > > ...
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >    public abstract KEY key();
> > > > > > >> > > >
> > > > > > >> > > >    public abstract W window();
> > > > > > >> > > >
> > > > > > >> > > >    public abstract int id();
> > > > > > >> > > >
> > > > > > >> > > >    public abstract FiringInfo firingInfo();
> > > > > > >> > > >
> > > > > > >> > > >    public abstract Iterable<IN>
elements();
> > > > > > >> > > >
> > > > > > >> > > >    public abstract void output(OUT value);
> > > > > > >> > > >
> > > > > > >> > > > }
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > Also on a slightly unrelated issue -
how hard it would
> be
> > to
> > > > > > >> > > > introduce different types of buffers
for the windows.
> > > > Currently
> > > > > > the
> > > > > > >> > > > existing one is behaving (when under
processing) similar
> > > with
> > > > a
> > > > > > FIFO
> > > > > > >> > > > queue (in the sense that you need to
start from
> beginning,
> > > > from
> > > > > > the
> > > > > > >> > oldest element).
> > > > > > >> > > > How about enabling for example also
LIFO behavior (start
> > > > > iterating
> > > > > > >> > > > through the list from the most recent
element). As in
> the
> > > > source
> > > > > > >> > > > queues or stacks are not actually used,
perhaps we can
> > just
> > > > pass
> > > > > > >> > > > policies to the iterator - or have custom
itrators
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > Dr. Radu Tudoran
> > > > > > >> > > > Research Engineer - Big Data Expert
> > > > > > >> > > > IT R&D Division
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research
> > > Center
> > > > > > >> > > > Riesstrasse 25, 80992 München
> > > > > > >> > > >
> > > > > > >> > > > E-mail: radu.tudoran@huawei.com
> > > > > > >> > > > Mobile: +49 15209084330 <+49%201520%209084330>
> <01520%209084330>
> > <+49%201520%209084330>
> > > <01520%209084330>
> > > > > > >> > > > Telephone: +49 891588344173 <+49%2089%201588344173>
> <089%201588344173>
> > <+49%2089%201588344173>
> > > <089%201588344173>
> > > > > > >> > > >
> > > > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205,
> 40549
> > > > > > >> > > > Düsseldorf, Germany, www.huawei.com
Registered
> > > > > > >> > > > Office: Düsseldorf, Register Court
Düsseldorf, HRB
> 56063,
> > > > > Managing
> > > > > > >> > > > Director: Bo PENG, Wanzhou MENG, Lifang
CHEN Sitz der
> > > > > > Gesellschaft:
> > > > > > >> > > > Düsseldorf, Amtsgericht Düsseldorf,
HRB 56063,
> > > > > > >> > > > Geschäftsführer: Bo PENG, Wanzhou
MENG, Lifang CHEN This
> > > > e-mail
> > > > > > and
> > > > > > >> > > > its attachments contain confidential
information from
> > > HUAWEI,
> > > > > > which
> > > > > > >> > > > is intended only for the person or entity
whose address
> is
> > > > > listed
> > > > > > >> > above.
> > > > > > >> > > > Any use of the information contained
herein in any way
> > > > > (including,
> > > > > > >> > > > but not limited to, total or partial
disclosure,
> > > reproduction,
> > > > > or
> > > > > > >> > > > dissemination) by persons other than
the intended
> > > recipient(s)
> > > > > is
> > > > > > >> > > > prohibited. If you receive this e-mail
in error, please
> > > notify
> > > > > the
> > > > > > >> > > > sender by phone or email immediately
and delete it!
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > -----Original Message-----
> > > > > > >> > > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > > > >> > > > Sent: Wednesday, July 13, 2016 2:24
PM
> > > > > > >> > > > To: dev@flink.apache.org
> > > > > > >> > > > Subject: Re: [DISCUSS] FLIP-2 Extending
Window Function
> > > > Metadata
> > > > > > >> > > >
> > > > > > >> > > > Sure, I also thought about this but
went for the
> "extreme"
> > > > > > initially.
> > > > > > >> > > > If no-one objects I'll update the doc
in a bit.
> > > > > > >> > > >
> > > > > > >> > > > On Wed, 13 Jul 2016 at 14:17 Stephan
Ewen <
> > sewen@apache.org
> > > >
> > > > > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Thanks for opening this.
> > > > > > >> > > > >
> > > > > > >> > > > > I see the need for having an extensible
context object
> > for
> > > > > > window
> > > > > > >> > > > > function invocations, but i think
hiding every
> parameter
> > > in
> > > > > the
> > > > > > >> > > > > context is a bit unnatural.
> > > > > > >> > > > >
> > > > > > >> > > > > How about having a function "apply(Key,
Values,
> > > > WindowContext,
> > > > > > >> > > > Collector)"
> > > > > > >> > > > > ?
> > > > > > >> > > > > It should be possible to write
the straightforward use
> > > cases
> > > > > > >> > > > > without accessing the context object.
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > On Wed, Jul 13, 2016 at 1:56 PM,
Aljoscha Krettek
> > > > > > >> > > > > <aljoscha@apache.org>
> > > > > > >> > > > > wrote:
> > > > > > >> > > > >
> > > > > > >> > > > > > Hi,
> > > > > > >> > > > > > this is a proposal to introduce
a new interface for
> > the
> > > > > window
> > > > > > >> > > > > > function
> > > > > > >> > > > > to
> > > > > > >> > > > > > make it more extensible for
the future where we
> might
> > > want
> > > > > to
> > > > > > >> > > > > > provide additional information
about why a window
> > fired
> > > to
> > > > > the
> > > > > > >> > > > > > user
> > > > > > >> > > > function:
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending
> > > > > > >> > > > > +W
> > > > > > >> > > > > in
> > > > > > >> > > > > dow+Function+Metadata
> > > > > > >> > > > > >
> > > > > > >> > > > > > I'd appreciate your thoughts!
> > > > > > >> > > > > >
> > > > > > >> > > > > > Cheers,
> > > > > > >> > > > > > Aljoscha
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message