flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Alexandrov <alexander.s.alexand...@gmail.com>
Subject Re: [DISCUSS] Gelly iteration abstractions
Date Sun, 22 Feb 2015 15:23:46 GMT
Hi Vasia,

I am trying to look at the problem in more detail. Which version of the MST
are you talking about?

Right now in the Gelly repository I can only find the SSSP example
(parallel Bellman-Ford) from Section 4.2 in [1].

However, it seems that the issues encountered by Andra are related to the
implementation of Parallel Boruvka (Section 3.2 in [2]). Is that correct?


[1] http://www.vldb.org/pvldb/vol7/p1047-han.pdf
[2] http://www.vldb.org/pvldb/vol7/p577-salihoglu.pdf

2015-02-19 21:03 GMT+01:00 Vasiliki Kalavri <vasilikikalavri@gmail.com>:

> Hello beautiful Flink people,
> during the past few days, Andra and I have been discussing about how to
> extend Gelly's iteration methods.
> Alexander's course (and his awesome students) has made it obvious that
> vertex-centric iterations are not the best fit for algorithms which don't
> follow the common "propagate-update" pattern. For example, Andra is working
> on an implementation of Minimum Spanning Tree, which requires branching
> inside an iteration and also requires a convergence check of an internal
> iteration. Others also reported similar issues [1, 2]. Trying to fit such
> algorithms to the vertex-centric model leads to long and ugly code, e.g.
> aggregators to keep track of algorithm phases, duplicating data, etc.
> One limitation of the vertex-centric and the upcoming GAS model is that
> they both only allow the vertex values to be updated in each iteration.
> However, for some algorithms we need to update the edge values and in
> others we need to update both. In even more complex situations (like
> Andra's MST) in some iterations we need to update the vertex values and in
> some iterations we need to update the edge values.
> Another problem is that we currently don't have a way to allow different
> computational phases inside an iteration. This is something that Giraph
> solves with master compute, a function that is executed once before each
> superstep and sets the computation function.
> All that said, I believe that we can solve most of these issues if we
> nicely expose Flink's iteration operators in Gelly. I can see the following
> cases:
> 1. Bulk & delta iterations where the solution set is the vertex dataset:
> this will be similar to vertex-centric and GAS, but will allow more
> flexible dataflows inside the iteration.
> 2. Bulk & delta iterations where the solution set is the edge dataset: for
> the cases where we need to update edge values.
> 3. Bulk & delta iterations where the solution set is the Graph: this will
> cover more complex cases, where the algorithm updates both vertices and
> edges or even adds/removes vertices/edges, i.e. updates the whole Graph.
> What do you think? I can see 1 & 2 being very easy to implement, but I
> suspect 3 won't be that easy (but so awesome to have ^^).
> Would it work the way a Graph is represented now, i.e. with 2 DataSets?
> Any comment, idea, pointer would be much appreciated! Thank you ^^
> Cheers,
> -V.
> [1]:
> http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Can-a-master-class-control-the-superstep-in-Flink-Spargel-td733.html
> [2]:
> http://issues.apache.org/jira/browse/FLINK-1552?focusedCommentId=14325769&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14325769

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message