flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nam-Luc Tran <namluc.t...@euranova.eu>
Subject Re: Stale Synchronous Parallel iterations in Flink
Date Mon, 23 Feb 2015 15:55:43 GMT
Hello guys,

Thank you for your replies.

>(1) How to checkpoint such computations in order to recover them
upon failures.
Approximate snapshots are possible with SSP. In this case, a snapshot
(solution set and or working set, what is exactly snapshotted can be
discussed) is triggered once a "clock=t" message has been received
from each thread. The snapshot will not be "pure" meaning that it
contains strictly updates from 1...t, as some faster threads may be
ahead of t and their results may already be incorporated by slower
threads. However iterative convergent algorithms suitable for SSPs are
able to tolerate this situation.

>(2) How to check for the termination of the computation without a
sync barrier. Can the SSP model help with that?
>From what I have seen the SSP evaluates the convergence of the global
model in order to terminate the iterations. In our case, I guess this
would require to evaluate the convergence on the solution set and/or
emptiness of the working set. Is it possible to interrupt Flink
operators while they are running?

>We signal the end of a superstep by pushing a special message from
the "head" of the loop to the "tail". This can be thought of like a
clock cycle. Currently, the new superstep starts on each
parallel thread once all "tails" have received the message, thus
forming the BSP barrier. For delta iterations, this "tail" is
the next workset - the solution set completely independent of that.
Could you point me out the classes in the code that realise this
behaviour?

I am very interested in digging deeper and implementing this in Flink.
What would be the best point to start on from here according to you? 

Best regards,

Tran Nam-Luc

At Monday, 23/02/2015 on 10:32 Stephan Ewen wrote:

Hey Tran Nam-Luc!

Great post with some really cool thoughts.
I just posted this answer to your LinkedIN post.

Greetings,
Stephan

=============================================

Nice post, very cool idea! Your understanding of Flink in that respect
is
really good. I had not heard of SSP before,
but it seems to be a good compromise between bulk synchronous and
asynchronous iterations.

Here are some comments and thought about how Apache Flink realizes
iterations and how that mechanism can be extended
to support SSP:

- The loop in Flink is standing, operators are not re-created and
re-deployed in every iteration.

- We signal the end of a superstep by pushing a special message from
the
"head" of the loop to the "tail". This can be thought of
   like a clock cycle. Currently, the new superstep starts on each
parallel
thread once all "tails" have received the message,
   thus forming the BSP barrier. For delta iterations, this "tail"
is the
next workset - the solution set completely independent of that.

- We can probably interpret the "end-of-superstep" messages as clock
messages. We could then allow threads to start their
   next superstep if all "tails" have seen clock messages at least
of its
own clock time minus the slack.

If you are looking to implement this in Flink, or dig deeper into
this, let
me know, I would be happy to help.

On Fri, Feb 20, 2015 at 5:27 PM, Nam-Luc Tran 
wrote:

> Hello Everyone,
>
> I am Nam-Luc Tran, research Engineer at EURA NOVA [1]. Our research
> subjects cover distributed machine learning and we have been working
> on dataflow graph processing for a while now. We have been reading
> from you since Stratosphere :-)
>
> Our current research focuses on Stale Synchronous Parallelism and we
> are currently considering Apache Flink as a good candidate for
> implementing and delivering the best results among the existing
> processing solutions. I have written a post about it here:
>
>
https://www.linkedin.com/pulse/stale-synchronous-parallelism-new-frontier-apache-flink-nam-luc-tran
>
>
> What do you guys think about the approach? Does it seem feasible, or
> do you have anything similar in your roadmap?
>
> Best regards,
>
> Tran Nam-Luc
>
>
>
> Links:
> ------
> [1] http://euranova.eu
>
>



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message