flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <smar...@apache.org>
Subject Re: Iteration Intermediate Output
Date Mon, 30 May 2016 12:49:58 GMT
This is a feature that was requested by the Mahout project few months
before for the very same reasons as mentioned in previous emails on this
thread, but we were snubbed by the flink folks as this being '*WAY too
specific*' request for flink to deal with and 'its got to be done the way
Flink has it', etc...

While delta iterations r real cool, its not real trivial to have them as
part of language specific DSLs handling more general iterations.  Its good
to see that this limitation has started to bite others and hopefully Data
Artisans now sees this as a much needed feature.



On Mon, May 30, 2016 at 8:31 AM, Gábor Gévay <ggab90@gmail.com> wrote:

> Hello,
>
> > Would the best way be to extend the iteration operators to support
> > intermediate outputs or revisit the idea of caching intermediate results
> > and thus allow efficient for-loop iterations?
>
> Caching intermediate results would also help a lot to projects that
> are targeting Flink as a backend, like Emma [1] and SystemML [2]. The
> issue here is that these languages allow writing more general
> iterations (general control flow (nested loops, ifs in the loop body),
> multiple "solution sets", doing something else with the intermediate
> results, etc.), that can't be translated to Flink's iteration
> constructs. So these systems currently don't have much better options
> than just writing intermediate results to files, which is not so nice.
>
> Best,
> Gabor
>
> [1]
> http://www.user.tu-berlin.de/asteriosk/assets/publications/emma-sigmod2015.pdf
> [2] https://systemml.apache.org/
>
>
>
> 2016-05-28 13:48 GMT+02:00 Vasiliki Kalavri <vasilikikalavri@gmail.com>:
> > Hey,
> >
> > it would be great to add this feature indeed! Thanks for bringing it up
> > Greg :)
> > Would the best way be to extend the iteration operators to support
> > intermediate outputs or revisit the idea of caching intermediate results
> > and thus allow efficient for-loop iterations?
> >
> > -Vasia.
> >
> > On 26 May 2016 at 22:41, Greg Hogan <code@greghogan.com> wrote:
> >
> >> Hi y'all,
> >>
> >> I think this is an oft-requested feature [0] and there are many graph
> >> algorithms for which intermediate output is the desired result. I'd
> like to
> >> take Stephan up on his offer [1] for pointers.
> >>
> >> I have yet to get in deep, but I see that iteration tasks are treated
> >> specially as IterationIntermediateTask for synchronization between
> >> supersteps. Also, when OperatorTranslation and GraphCreatingVisitor are
> >> walking the program DAG an iteration must be first reached through the
> >> tail.
> >>
> >> Greg
> >>
> >> [0]
> >>
> >>
> http://stackoverflow.com/questions/37224140/possibility-of-saving-partial-outputs-from-bulk-iteration-in-flink-dataset
> >> [1]
> >>
> >>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Intermediate-output-during-delta-iterations-td436.html
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message