flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: buffering in operators, implementing statistics
Date Mon, 23 May 2016 15:18:52 GMT
Hi,
no such changes are planned right now. The separaten between the keys is
very strict in order to make the windowing state re-partitionable so that
we can implement dynamic rescaling of the parallelism of a program.

The WindowAll is only used for specific cases where you need a Trigger that
sees all elements of the stream. I personally don't think it is very useful
because it is not scaleable. In theory, for time windows this can be
parallelized but it is not currently done in Flink.

Do you have a specific use case for the count-min sketch in mind. If not,
maybe our energy is better spent on producing examples with real-world
applicability. I'm not against having an example for a count-min sketch,
I'm just worried that you might put your energy into something that is not
useful to a lot of people.

Cheers,
Aljoscha

On Fri, 20 May 2016 at 20:13 Stavros Kontopoulos <st.kontopoulos@gmail.com>
wrote:

> Hi thnx for the feedback.
>
> So there is a limitation due to parallel windows implementation.
> No intentions to change that somehow to accommodate similar estimations?
>
> WindowAll in practice is used as step in the pipeline? I mean since its
> inherently not parallel cannot scale correct?
> Although there is an exception: "Only for special cases, such as aligned
> time windows is it possible to perform this operation in parallel"
> Probably missing something...
>
> I could try do the example stuff (and open a new feature on jira for that).
> I will also vote for closing the old issue too since there is no other way
> at least for the time being...
>
> Thanx,
> Stavros
>
> On Fri, May 20, 2016 at 7:02 PM, Aljoscha Krettek <aljoscha@apache.org>
> wrote:
>
> > Hi,
> > with how the window API currently works this can only be done for
> > non-parallel windows. For keyed windows everything that happens is scoped
> > to the key of the elements: window contents are kept in per-key state,
> > triggers fire on a per-key basis. Therefore a count-min sketch cannot be
> > used because it would require to keep state across keys.
> >
> > For non-parallel windows a user could do this:
> >
> > DataStream input = ...
> > input
> >   .windowAll(<some window>)
> >   .fold(new MySketch(), new MySketchFoldFunction())
> >
> > with sketch data types and a fold function that is tailored to the user
> > types. Therefore, I would prefer to not add a special API for this and
> vote
> > to close https://issues.apache.org/jira/browse/FLINK-2147. I already
> > commented on https://issues.apache.org/jira/browse/FLINK-2144, saying a
> > similar thing.
> >
> > What I would welcome very much is to add some well documented examples to
> > Flink that showcase how some of these operations can be written.
> >
> > Cheers,
> > Aljoscha
> >
> > On Thu, 19 May 2016 at 16:38 Stavros Kontopoulos <
> st.kontopoulos@gmail.com
> > >
> > wrote:
> >
> > > Hi guys,
> > >
> > > I would like to push forward the work here:
> > > https://issues.apache.org/jira/browse/FLINK-2147
> > >
> > > Can anyone more familiar with streaming api verify if this could be a
> > > mature task.
> > > The intention is to summarize data over a window like in the case of
> > > StreamGroupedFold.
> > > Specifically implement count min in a window.
> > >
> > > Best,
> > > Stavros
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message