Hi Vasia,
I am trying to look at the problem in more detail. Which version of the MST
are you talking about?
Right now in the Gelly repository I can only find the SSSP example
(parallel BellmanFord) from Section 4.2 in [1].
However, it seems that the issues encountered by Andra are related to the
implementation of Parallel Boruvka (Section 3.2 in [2]). Is that correct?
Regards,
A.
[1] http://www.vldb.org/pvldb/vol7/p1047han.pdf
[2] http://www.vldb.org/pvldb/vol7/p577salihoglu.pdf
20150219 21:03 GMT+01:00 Vasiliki Kalavri <vasilikikalavri@gmail.com>:
> Hello beautiful Flink people,
>
> during the past few days, Andra and I have been discussing about how to
> extend Gelly's iteration methods.
>
> Alexander's course (and his awesome students) has made it obvious that
> vertexcentric iterations are not the best fit for algorithms which don't
> follow the common "propagateupdate" pattern. For example, Andra is working
> on an implementation of Minimum Spanning Tree, which requires branching
> inside an iteration and also requires a convergence check of an internal
> iteration. Others also reported similar issues [1, 2]. Trying to fit such
> algorithms to the vertexcentric model leads to long and ugly code, e.g.
> aggregators to keep track of algorithm phases, duplicating data, etc.
>
> One limitation of the vertexcentric and the upcoming GAS model is that
> they both only allow the vertex values to be updated in each iteration.
> However, for some algorithms we need to update the edge values and in
> others we need to update both. In even more complex situations (like
> Andra's MST) in some iterations we need to update the vertex values and in
> some iterations we need to update the edge values.
> Another problem is that we currently don't have a way to allow different
> computational phases inside an iteration. This is something that Giraph
> solves with master compute, a function that is executed once before each
> superstep and sets the computation function.
>
> All that said, I believe that we can solve most of these issues if we
> nicely expose Flink's iteration operators in Gelly. I can see the following
> cases:
>
> 1. Bulk & delta iterations where the solution set is the vertex dataset:
> this will be similar to vertexcentric and GAS, but will allow more
> flexible dataflows inside the iteration.
> 2. Bulk & delta iterations where the solution set is the edge dataset: for
> the cases where we need to update edge values.
> 3. Bulk & delta iterations where the solution set is the Graph: this will
> cover more complex cases, where the algorithm updates both vertices and
> edges or even adds/removes vertices/edges, i.e. updates the whole Graph.
>
> What do you think? I can see 1 & 2 being very easy to implement, but I
> suspect 3 won't be that easy (but so awesome to have ^^).
> Would it work the way a Graph is represented now, i.e. with 2 DataSets?
>
> Any comment, idea, pointer would be much appreciated! Thank you ^^
>
> Cheers,
> V.
>
> [1]:
>
> http://apacheflinkincubatorusermailinglistarchive.2336050.n4.nabble.com/CanamasterclasscontrolthesuperstepinFlinkSpargeltd733.html
> [2]:
>
> http://issues.apache.org/jira/browse/FLINK1552?focusedCommentId=14325769&page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel#comment14325769
>
