flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyula.f...@gmail.com>
Subject Re: Send events to parallel operator instances
Date Thu, 04 Jun 2015 09:52:43 GMT
I am simply thinking about the best way to send data to different subtasks
of the same operator.

Can we go back to the original question? :D

Stephan Ewen <sewen@apache.org> ezt írta (időpont: 2015. jún. 3., Sze,
23:45):

> I think that it may be a bit pre-mature to invest heavily into the parallel
> delta-policy windows just yet.
> We have not even answered all questions on the key-local delta windows yet:
>
>  - How does it behave with non-monotonous changes? What does the delta
> refer to, the max interval in the window, the interval to the earliest
> element. The max difference between two consecutive elements?
>
>  - What about the order of records? Are deltas even interesting when
> records come in arbitrary order? What about the predictability of recovery
> runs?
>
>
> I would assume that a consistent version of the key-local delta windows
> will get us a long way, use-case wise.
>
> Let's learn more about how users use these policies in the "simple" case.
> Because that will impact the protocol for global coordination (for examplea
> concerning order and relative to what element are the deltas computed, the
> first or the min). Otherwise we invest a lot of effort into something where
> we have not yet a clear understanding about how we actually want it to
> behave, exactly.
>
> What do you think?
>
>
>
>
> On Wed, Jun 3, 2015 at 2:14 PM, Gyula Fóra <gyula.fora@gmail.com> wrote:
>
> > I am talking of course about global delta windows. On the full stream not
> > on a partition. Delta windows per partition happens as you said currently
> > as well.
> >
> > On Wednesday, June 3, 2015, Aljoscha Krettek <aljoscha@apache.org>
> wrote:
> >
> > > Yes, this is obvious, but if we simply partition the data on the
> > > attribute that we use for the delta policy this can be done purely on
> > > one machine. No need for complex communication/synchronization.
> > >
> > > On Wed, Jun 3, 2015 at 1:32 PM, Gyula Fóra <gyula.fora@gmail.com
> > > <javascript:;>> wrote:
> > > > Yes, we define a delta function from the first element to the last
> > > element
> > > > in a window. Now let's discretize the stream using this semantics in
> > > > parallel.
> > > >
> > > > Aljoscha Krettek <aljoscha@apache.org <javascript:;>> ezt
írta
> > > (időpont: 2015. jún. 3.,
> > > > Sze, 12:20):
> > > >
> > > >> Ah ok. And by distributed you mean that the element that starts the
> > > >> window can be processed on a different machine than the element that
> > > >> finishes the window?
> > > >>
> > > >> On Wed, Jun 3, 2015 at 12:11 PM, Gyula Fóra <gyula.fora@gmail.com
> > > <javascript:;>> wrote:
> > > >> > This is not connected to the current implementation. So lets
not
> > talk
> > > >> about
> > > >> > that.
> > > >> >
> > > >> > This is about theoretical limits to support distributed delta
> > policies
> > > >> > which has far reaching implications for the windowing policies
one
> > can
> > > >> > implement in a prallel way.
> > > >> >
> > > >> > But you are welcome to throw in any constructive ideas :)
> > > >> > On Wed, Jun 3, 2015 at 11:49 AM Aljoscha Krettek <
> > aljoscha@apache.org
> > > <javascript:;>>
> > > >> > wrote:
> > > >> >
> > > >> >> Part of the reason for my question is this:
> > > >> >> https://issues.apache.org/jira/browse/FLINK-1967. Especially
my
> > > latest
> > > >> >> comment there. If we want this, I think we have to overhaul
the
> > > >> >> windowing system anyways and then it doesn't make sense to
> explore
> > > >> >> complicated workarounds for the current system.
> > > >> >>
> > > >> >> On Wed, Jun 3, 2015 at 11:07 AM, Gyula Fóra <
> gyula.fora@gmail.com
> > > <javascript:;>>
> > > >> wrote:
> > > >> >> > There are simple ways of implementing it in a non-distributed
> or
> > > >> >> > inconsistent fashion.
> > > >> >> > On Wed, Jun 3, 2015 at 8:55 AM Aljoscha Krettek <
> > > aljoscha@apache.org <javascript:;>>
> > > >> >> wrote:
> > > >> >> >
> > > >> >> >> This already sounds awfully complicated. Is there
no other way
> > to
> > > >> >> >> implement the delta windows?
> > > >> >> >>
> > > >> >> >> On Wed, Jun 3, 2015 at 7:52 AM, Gyula Fóra <
> > gyula.fora@gmail.com
> > > <javascript:;>>
> > > >> >> wrote:
> > > >> >> >> > Hi Ufuk,
> > > >> >> >> >
> > > >> >> >> > In the concrete use case I have in mind I only
want to send
> > > events
> > > >> to
> > > >> >> >> > another subtask of the same task vertex.
> > > >> >> >> >
> > > >> >> >> > Specifically: if we want to do distributed
delta based
> windows
> > > we
> > > >> >> need to
> > > >> >> >> > send after every trigger the element that has
triggered the
> > > current
> > > >> >> >> window.
> > > >> >> >> > So practically I want to broadcast some event
regularly to
> all
> > > >> >> subtasks
> > > >> >> >> of
> > > >> >> >> > the same operator.
> > > >> >> >> >
> > > >> >> >> > In this case the operators would wait until
they receive
> this
> > > event
> > > >> >> so we
> > > >> >> >> > need to make sure that this event sending is
not blocked by
> > the
> > > >> actual
> > > >> >> >> > records.
> > > >> >> >> >
> > > >> >> >> > Gyula
> > > >> >> >> >
> > > >> >> >> > On Tuesday, June 2, 2015, Ufuk Celebi <uce@apache.org
> > > <javascript:;>> wrote:
> > > >> >> >> >
> > > >> >> >> >>
> > > >> >> >> >> On 02 Jun 2015, at 22:45, Gyula Fóra <gyfora@apache.org
> > > <javascript:;>
> > > >> >> <javascript:;>>
> > > >> >> >> >> wrote:
> > > >> >> >> >> > I am wondering, what is the suggested
way to send some
> > events
> > > >> >> >> directly to
> > > >> >> >> >> > another parallel instance in a flink
job? For example
> from
> > > one
> > > >> >> mapper
> > > >> >> >> to
> > > >> >> >> >> > another mapper (of the same operator).
> > > >> >> >> >> >
> > > >> >> >> >> > Do we have any internal support for
this? The first thing
> > > that
> > > >> we
> > > >> >> >> thought
> > > >> >> >> >> > of is iterations but that is clearly
an overkill.
> > > >> >> >> >>
> > > >> >> >> >> There is no support for this at the moment.
Any parallel
> > > instance?
> > > >> >> Or a
> > > >> >> >> >> subtask instance of the same task?
> > > >> >> >> >>
> > > >> >> >> >> Can you provide more input on the use case?
It is certainly
> > > >> possible
> > > >> >> to
> > > >> >> >> >> add support for this.
> > > >> >> >> >>
> > > >> >> >> >> If the events don't need to be inline with
the records, we
> > can
> > > >> easily
> > > >> >> >> >> setup the TaskEventDispatcher as a separate
actor (or
> extend
> > > the
> > > >> task
> > > >> >> >> >> manager) to process both backwards flowing
events and in
> > > general
> > > >> any
> > > >> >> >> events
> > > >> >> >> >> that don't need to be inline with the records.
The task
> > > deployment
> > > >> >> >> >> descriptors need to be extended with the
extra parallel
> > > instance
> > > >> >> >> >> information.
> > > >> >> >> >>
> > > >> >> >> >> – Ufuk
> > > >> >> >>
> > > >> >>
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message