flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Márton Balassi <balassi.mar...@gmail.com>
Subject Re: Iterative stream processing - cyclic JobGraph?
Date Thu, 10 Jul 2014 12:20:36 GMT
After a recent skype meeting with Stefan we've came up with the following
API based on the ideas Gyula described in the last message.
https://github.com/stratosphere/stratosphere-streaming/blob/iterate/stratosphere-streaming-core/src/test/java/eu/stratosphere/streaming/api/IterateTest.java#L80-85

IterativeDataStream<Tuple1<Integer>> source =
env.fromElements(1).flatMap(new Forward()).iterate();

DataStream<Tuple1<Integer>> increment = source.flatMap(new Increment());

source.closeWith(increment).print;

This way the JobGraph stays acyclic, but iteration is enabled as a feature.

Cheers,

Marton

On Wed, Jul 9, 2014 at 4:59 PM, Gyula Fóra <gyula.fora@gmail.com> wrote:
>
> Hey,
> I implemented a simple version of what you do for IterativeDatasets for
> streaming with using BlockingQueues to pass tuples between inmemory tasks.
> It seems to be working locally to contruct cycles in the graph.
> Is there a reason why we should use BlockingBackChannnel instead of a
> single BlockingQueue when we dont need to serialize the tuples anyway (we
> are staying in-memory)?
>
> Regards,
> Gyula
>
>
> On Wed, Jul 9, 2014 at 10:26 AM, Ufuk Celebi <u.celebi@fu-berlin.de>
wrote:
>
> > Hey Gyula,
> >
> > the BlockingBackChannel is only used for data feedback. The
> > synchronization happens with a separate barrier called SuperstepBarrier.
> > (For delta iterations there is also a further SolutionSetUpdateBarrier).
> > But this are more "runtime hacks" (the DAG still looks like a DAG to the
> > execution engine) and you would have to think about how to model this
stuff
> > for streams.
> >
> > Best,
> >
> > Ufuk
> >
> > On 09 Jul 2014, at 10:14, Gyula Fóra <gyula.fora@gmail.com> wrote:
> >
> > > Hey,
> > >
> > > So we started digging through the IterativeDataSet and it seems like
you
> > > are using the BlockingBackChannel as this special feedback edge. Does
it
> > > work as a synchronization barrier would in BSP? If so it is probably
not
> > > suitable for processing continuous dataflows is it?
> > >
> > > Gyula
> > >
> > >
> > >
> > >
> > > On Tue, Jul 8, 2014 at 4:33 PM, Stephan Ewen <sewen@apache.org> wrote:
> > >
> > >> Cyclic graphs are a problem for both the scheduler and the
> > >> currently-under-works recovery logic.
> > >>
> > >> General cyclic graphs are a bit of a design problem.
> > >>
> > >> I think what we need in the future is something like a special
feedback
> > >> edge that closes the cycle and that is synchronized (in batches /
> > >> generations) to make it transparent to the recovery and scheduling
> > logic.
> > >>
> > >> Stephan
> > >>
> >
> >

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message