flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: [DISCUSS] Gelly iteration abstractions
Date Mon, 23 Feb 2015 10:05:14 GMT
As a workaround, it should always work to get the Edge and Vertex data set
from the graph and use the regular Fink iteration operators?


On Sun, Feb 22, 2015 at 4:53 PM, Vasiliki Kalavri <vasilikikalavri@gmail.com
> wrote:

> Hi,
>
> yes, I was referring to the parallel Boruvka algorithm. There are several
> ways to implement this one in Flink and I believe that the one described in
> the paper (vertex-centric) is not the most elegant one :)
>
> Andra is now working on an idea that uses the delta iteration abstraction
> and we believe that it will be both more efficient and easier to
> understand. It has the edges in the solution set and the vertices in the
> workset, so it follows the pattern I describe in (2) in my previous e-mail.
> As a next step, we would like to see how having an iteration operator that
> could update the whole graph -what I describe as (3)- would make this even
> nicer.
>
> Any ideas are highly welcome!
>
> Cheers,
> V.
>
> On 22 February 2015 at 16:32, Andra Lungu <lungu.andra@gmail.com> wrote:
>
> > Hi Alex,
> >
> > Vasia is talking about the second version(presented Friday) of Parallel
> > Boruvka, which can be found here:
> > https://github.com/TU-Berlin-DIMA/IMPRO-3.WS14/pull/59
> >
> > I will propose the third, non-Pregel like approach directly to Gelly
> soon.
> >
> > If you have additional questions, I will be happy to answer them.
> >
> > Andra
> >
> > On Sun, Feb 22, 2015 at 4:23 PM, Alexander Alexandrov <
> > alexander.s.alexandrov@gmail.com> wrote:
> >
> > > Hi Vasia,
> > >
> > > I am trying to look at the problem in more detail. Which version of the
> > MST
> > > are you talking about?
> > >
> > > Right now in the Gelly repository I can only find the SSSP example
> > > (parallel Bellman-Ford) from Section 4.2 in [1].
> > >
> > > However, it seems that the issues encountered by Andra are related to
> the
> > > implementation of Parallel Boruvka (Section 3.2 in [2]). Is that
> correct?
> > >
> > > Regards,
> > > A.
> > >
> > > [1] http://www.vldb.org/pvldb/vol7/p1047-han.pdf
> > > [2] http://www.vldb.org/pvldb/vol7/p577-salihoglu.pdf
> > >
> > > 2015-02-19 21:03 GMT+01:00 Vasiliki Kalavri <vasilikikalavri@gmail.com
> >:
> > >
> > > > Hello beautiful Flink people,
> > > >
> > > > during the past few days, Andra and I have been discussing about how
> to
> > > > extend Gelly's iteration methods.
> > > >
> > > > Alexander's course (and his awesome students) has made it obvious
> that
> > > > vertex-centric iterations are not the best fit for algorithms which
> > don't
> > > > follow the common "propagate-update" pattern. For example, Andra is
> > > working
> > > > on an implementation of Minimum Spanning Tree, which requires
> branching
> > > > inside an iteration and also requires a convergence check of an
> > internal
> > > > iteration. Others also reported similar issues [1, 2]. Trying to fit
> > such
> > > > algorithms to the vertex-centric model leads to long and ugly code,
> > e.g.
> > > > aggregators to keep track of algorithm phases, duplicating data, etc.
> > > >
> > > > One limitation of the vertex-centric and the upcoming GAS model is
> that
> > > > they both only allow the vertex values to be updated in each
> iteration.
> > > > However, for some algorithms we need to update the edge values and in
> > > > others we need to update both. In even more complex situations (like
> > > > Andra's MST) in some iterations we need to update the vertex values
> and
> > > in
> > > > some iterations we need to update the edge values.
> > > > Another problem is that we currently don't have a way to allow
> > different
> > > > computational phases inside an iteration. This is something that
> Giraph
> > > > solves with master compute, a function that is executed once before
> > each
> > > > superstep and sets the computation function.
> > > >
> > > > All that said, I believe that we can solve most of these issues if we
> > > > nicely expose Flink's iteration operators in Gelly. I can see the
> > > following
> > > > cases:
> > > >
> > > > 1. Bulk & delta iterations where the solution set is the vertex
> > dataset:
> > > > this will be similar to vertex-centric and GAS, but will allow more
> > > > flexible dataflows inside the iteration.
> > > > 2. Bulk & delta iterations where the solution set is the edge
> dataset:
> > > for
> > > > the cases where we need to update edge values.
> > > > 3. Bulk & delta iterations where the solution set is the Graph: this
> > will
> > > > cover more complex cases, where the algorithm updates both vertices
> and
> > > > edges or even adds/removes vertices/edges, i.e. updates the whole
> > Graph.
> > > >
> > > > What do you think? I can see 1 & 2 being very easy to implement, but
> I
> > > > suspect 3 won't be that easy (but so awesome to have ^^).
> > > > Would it work the way a Graph is represented now, i.e. with 2
> DataSets?
> > > >
> > > > Any comment, idea, pointer would be much appreciated! Thank you ^^
> > > >
> > > > Cheers,
> > > > -V.
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Can-a-master-class-control-the-superstep-in-Flink-Spargel-td733.html
> > > > [2]:
> > > >
> > > >
> > >
> >
> http://issues.apache.org/jira/browse/FLINK-1552?focusedCommentId=14325769&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14325769
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message