flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Send events to parallel operator instances
Date Wed, 03 Jun 2015 21:44:59 GMT
I think that it may be a bit pre-mature to invest heavily into the parallel
delta-policy windows just yet.
We have not even answered all questions on the key-local delta windows yet:

 - How does it behave with non-monotonous changes? What does the delta
refer to, the max interval in the window, the interval to the earliest
element. The max difference between two consecutive elements?

 - What about the order of records? Are deltas even interesting when
records come in arbitrary order? What about the predictability of recovery
runs?


I would assume that a consistent version of the key-local delta windows
will get us a long way, use-case wise.

Let's learn more about how users use these policies in the "simple" case.
Because that will impact the protocol for global coordination (for examplea
concerning order and relative to what element are the deltas computed, the
first or the min). Otherwise we invest a lot of effort into something where
we have not yet a clear understanding about how we actually want it to
behave, exactly.

What do you think?




On Wed, Jun 3, 2015 at 2:14 PM, Gyula Fóra <gyula.fora@gmail.com> wrote:

> I am talking of course about global delta windows. On the full stream not
> on a partition. Delta windows per partition happens as you said currently
> as well.
>
> On Wednesday, June 3, 2015, Aljoscha Krettek <aljoscha@apache.org> wrote:
>
> > Yes, this is obvious, but if we simply partition the data on the
> > attribute that we use for the delta policy this can be done purely on
> > one machine. No need for complex communication/synchronization.
> >
> > On Wed, Jun 3, 2015 at 1:32 PM, Gyula Fóra <gyula.fora@gmail.com
> > <javascript:;>> wrote:
> > > Yes, we define a delta function from the first element to the last
> > element
> > > in a window. Now let's discretize the stream using this semantics in
> > > parallel.
> > >
> > > Aljoscha Krettek <aljoscha@apache.org <javascript:;>> ezt írta
> > (időpont: 2015. jún. 3.,
> > > Sze, 12:20):
> > >
> > >> Ah ok. And by distributed you mean that the element that starts the
> > >> window can be processed on a different machine than the element that
> > >> finishes the window?
> > >>
> > >> On Wed, Jun 3, 2015 at 12:11 PM, Gyula Fóra <gyula.fora@gmail.com
> > <javascript:;>> wrote:
> > >> > This is not connected to the current implementation. So lets not
> talk
> > >> about
> > >> > that.
> > >> >
> > >> > This is about theoretical limits to support distributed delta
> policies
> > >> > which has far reaching implications for the windowing policies one
> can
> > >> > implement in a prallel way.
> > >> >
> > >> > But you are welcome to throw in any constructive ideas :)
> > >> > On Wed, Jun 3, 2015 at 11:49 AM Aljoscha Krettek <
> aljoscha@apache.org
> > <javascript:;>>
> > >> > wrote:
> > >> >
> > >> >> Part of the reason for my question is this:
> > >> >> https://issues.apache.org/jira/browse/FLINK-1967. Especially my
> > latest
> > >> >> comment there. If we want this, I think we have to overhaul the
> > >> >> windowing system anyways and then it doesn't make sense to explore
> > >> >> complicated workarounds for the current system.
> > >> >>
> > >> >> On Wed, Jun 3, 2015 at 11:07 AM, Gyula Fóra <gyula.fora@gmail.com
> > <javascript:;>>
> > >> wrote:
> > >> >> > There are simple ways of implementing it in a non-distributed
or
> > >> >> > inconsistent fashion.
> > >> >> > On Wed, Jun 3, 2015 at 8:55 AM Aljoscha Krettek <
> > aljoscha@apache.org <javascript:;>>
> > >> >> wrote:
> > >> >> >
> > >> >> >> This already sounds awfully complicated. Is there no
other way
> to
> > >> >> >> implement the delta windows?
> > >> >> >>
> > >> >> >> On Wed, Jun 3, 2015 at 7:52 AM, Gyula Fóra <
> gyula.fora@gmail.com
> > <javascript:;>>
> > >> >> wrote:
> > >> >> >> > Hi Ufuk,
> > >> >> >> >
> > >> >> >> > In the concrete use case I have in mind I only want
to send
> > events
> > >> to
> > >> >> >> > another subtask of the same task vertex.
> > >> >> >> >
> > >> >> >> > Specifically: if we want to do distributed delta
based windows
> > we
> > >> >> need to
> > >> >> >> > send after every trigger the element that has triggered
the
> > current
> > >> >> >> window.
> > >> >> >> > So practically I want to broadcast some event regularly
to all
> > >> >> subtasks
> > >> >> >> of
> > >> >> >> > the same operator.
> > >> >> >> >
> > >> >> >> > In this case the operators would wait until they
receive this
> > event
> > >> >> so we
> > >> >> >> > need to make sure that this event sending is not
blocked by
> the
> > >> actual
> > >> >> >> > records.
> > >> >> >> >
> > >> >> >> > Gyula
> > >> >> >> >
> > >> >> >> > On Tuesday, June 2, 2015, Ufuk Celebi <uce@apache.org
> > <javascript:;>> wrote:
> > >> >> >> >
> > >> >> >> >>
> > >> >> >> >> On 02 Jun 2015, at 22:45, Gyula Fóra <gyfora@apache.org
> > <javascript:;>
> > >> >> <javascript:;>>
> > >> >> >> >> wrote:
> > >> >> >> >> > I am wondering, what is the suggested way
to send some
> events
> > >> >> >> directly to
> > >> >> >> >> > another parallel instance in a flink job?
For example from
> > one
> > >> >> mapper
> > >> >> >> to
> > >> >> >> >> > another mapper (of the same operator).
> > >> >> >> >> >
> > >> >> >> >> > Do we have any internal support for this?
The first thing
> > that
> > >> we
> > >> >> >> thought
> > >> >> >> >> > of is iterations but that is clearly an
overkill.
> > >> >> >> >>
> > >> >> >> >> There is no support for this at the moment.
Any parallel
> > instance?
> > >> >> Or a
> > >> >> >> >> subtask instance of the same task?
> > >> >> >> >>
> > >> >> >> >> Can you provide more input on the use case?
It is certainly
> > >> possible
> > >> >> to
> > >> >> >> >> add support for this.
> > >> >> >> >>
> > >> >> >> >> If the events don't need to be inline with the
records, we
> can
> > >> easily
> > >> >> >> >> setup the TaskEventDispatcher as a separate
actor (or extend
> > the
> > >> task
> > >> >> >> >> manager) to process both backwards flowing events
and in
> > general
> > >> any
> > >> >> >> events
> > >> >> >> >> that don't need to be inline with the records.
The task
> > deployment
> > >> >> >> >> descriptors need to be extended with the extra
parallel
> > instance
> > >> >> >> >> information.
> > >> >> >> >>
> > >> >> >> >> – Ufuk
> > >> >> >>
> > >> >>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message