flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasiliki Kalavri <vasilikikala...@gmail.com>
Subject Re: [DISCUSS] Gelly iteration abstractions
Date Mon, 23 Feb 2015 12:03:16 GMT
Hi Stephan,

yes, this would work for the cases where an algorithm only updates the
vertex values or only updates the edge values.

What we would like to also support is
(a) algorithms where both vertices and edges are updated in one iteration
(b) algorithms where the graph structure changes from one iteration to the
next and
(c) branching inside an iteration, i.e. executing a different "iteration
body" based on some condition.

We can still currently implement those with regular Flink iteration
operators, but the resulting code is not that nice or efficient.
For example, if we want to update both edges and vertex values, we can
still create a solution set where the vertex values are attached to each
edge.
Regarding branching inside an iteration, we can use an aggregator that
tracks the iteration phase, but then we need to somehow make the different
phases to consist of the same operators and also check the branching
condition inside each UDF.

Cheers,
V.


On 23 February 2015 at 11:05, Stephan Ewen <sewen@apache.org> wrote:

> As a workaround, it should always work to get the Edge and Vertex data set
> from the graph and use the regular Fink iteration operators?
>
>
> On Sun, Feb 22, 2015 at 4:53 PM, Vasiliki Kalavri <
> vasilikikalavri@gmail.com
> > wrote:
>
> > Hi,
> >
> > yes, I was referring to the parallel Boruvka algorithm. There are several
> > ways to implement this one in Flink and I believe that the one described
> in
> > the paper (vertex-centric) is not the most elegant one :)
> >
> > Andra is now working on an idea that uses the delta iteration abstraction
> > and we believe that it will be both more efficient and easier to
> > understand. It has the edges in the solution set and the vertices in the
> > workset, so it follows the pattern I describe in (2) in my previous
> e-mail.
> > As a next step, we would like to see how having an iteration operator
> that
> > could update the whole graph -what I describe as (3)- would make this
> even
> > nicer.
> >
> > Any ideas are highly welcome!
> >
> > Cheers,
> > V.
> >
> > On 22 February 2015 at 16:32, Andra Lungu <lungu.andra@gmail.com> wrote:
> >
> > > Hi Alex,
> > >
> > > Vasia is talking about the second version(presented Friday) of Parallel
> > > Boruvka, which can be found here:
> > > https://github.com/TU-Berlin-DIMA/IMPRO-3.WS14/pull/59
> > >
> > > I will propose the third, non-Pregel like approach directly to Gelly
> > soon.
> > >
> > > If you have additional questions, I will be happy to answer them.
> > >
> > > Andra
> > >
> > > On Sun, Feb 22, 2015 at 4:23 PM, Alexander Alexandrov <
> > > alexander.s.alexandrov@gmail.com> wrote:
> > >
> > > > Hi Vasia,
> > > >
> > > > I am trying to look at the problem in more detail. Which version of
> the
> > > MST
> > > > are you talking about?
> > > >
> > > > Right now in the Gelly repository I can only find the SSSP example
> > > > (parallel Bellman-Ford) from Section 4.2 in [1].
> > > >
> > > > However, it seems that the issues encountered by Andra are related to
> > the
> > > > implementation of Parallel Boruvka (Section 3.2 in [2]). Is that
> > correct?
> > > >
> > > > Regards,
> > > > A.
> > > >
> > > > [1] http://www.vldb.org/pvldb/vol7/p1047-han.pdf
> > > > [2] http://www.vldb.org/pvldb/vol7/p577-salihoglu.pdf
> > > >
> > > > 2015-02-19 21:03 GMT+01:00 Vasiliki Kalavri <
> vasilikikalavri@gmail.com
> > >:
> > > >
> > > > > Hello beautiful Flink people,
> > > > >
> > > > > during the past few days, Andra and I have been discussing about
> how
> > to
> > > > > extend Gelly's iteration methods.
> > > > >
> > > > > Alexander's course (and his awesome students) has made it obvious
> > that
> > > > > vertex-centric iterations are not the best fit for algorithms which
> > > don't
> > > > > follow the common "propagate-update" pattern. For example, Andra
is
> > > > working
> > > > > on an implementation of Minimum Spanning Tree, which requires
> > branching
> > > > > inside an iteration and also requires a convergence check of an
> > > internal
> > > > > iteration. Others also reported similar issues [1, 2]. Trying to
> fit
> > > such
> > > > > algorithms to the vertex-centric model leads to long and ugly code,
> > > e.g.
> > > > > aggregators to keep track of algorithm phases, duplicating data,
> etc.
> > > > >
> > > > > One limitation of the vertex-centric and the upcoming GAS model is
> > that
> > > > > they both only allow the vertex values to be updated in each
> > iteration.
> > > > > However, for some algorithms we need to update the edge values and
> in
> > > > > others we need to update both. In even more complex situations
> (like
> > > > > Andra's MST) in some iterations we need to update the vertex values
> > and
> > > > in
> > > > > some iterations we need to update the edge values.
> > > > > Another problem is that we currently don't have a way to allow
> > > different
> > > > > computational phases inside an iteration. This is something that
> > Giraph
> > > > > solves with master compute, a function that is executed once before
> > > each
> > > > > superstep and sets the computation function.
> > > > >
> > > > > All that said, I believe that we can solve most of these issues if
> we
> > > > > nicely expose Flink's iteration operators in Gelly. I can see the
> > > > following
> > > > > cases:
> > > > >
> > > > > 1. Bulk & delta iterations where the solution set is the vertex
> > > dataset:
> > > > > this will be similar to vertex-centric and GAS, but will allow more
> > > > > flexible dataflows inside the iteration.
> > > > > 2. Bulk & delta iterations where the solution set is the edge
> > dataset:
> > > > for
> > > > > the cases where we need to update edge values.
> > > > > 3. Bulk & delta iterations where the solution set is the Graph:
> this
> > > will
> > > > > cover more complex cases, where the algorithm updates both vertices
> > and
> > > > > edges or even adds/removes vertices/edges, i.e. updates the whole
> > > Graph.
> > > > >
> > > > > What do you think? I can see 1 & 2 being very easy to implement,
> but
> > I
> > > > > suspect 3 won't be that easy (but so awesome to have ^^).
> > > > > Would it work the way a Graph is represented now, i.e. with 2
> > DataSets?
> > > > >
> > > > > Any comment, idea, pointer would be much appreciated! Thank you ^^
> > > > >
> > > > > Cheers,
> > > > > -V.
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Can-a-master-class-control-the-superstep-in-Flink-Spargel-td733.html
> > > > > [2]:
> > > > >
> > > > >
> > > >
> > >
> >
> http://issues.apache.org/jira/browse/FLINK-1552?focusedCommentId=14325769&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14325769
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message