flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Hogan <c...@greghogan.com>
Subject Re: [DISCUSS] Graph algorithms for vertex and edge degree
Date Mon, 25 Apr 2016 20:49:53 GMT
Hi Fabian,

I don't know if this has been looked at. There is discussion of
BipartiteGraph in FLINK-2254. If Gelly had DirectedGraph and
UndirectedGraph then the API could stay lean while methods could be tuned
for the specific graph type. I do like having simple APIs such as DataSet
and Graph for for interactive use in small scripts or Scala and Python
shells.

Greg

On Mon, Apr 25, 2016 at 8:46 AM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi Greg and Vasia,
>
> thanks for starting this discussion.
> I think it is a good idea to update the Gelly roadmap. As Vasia said, many
> items on the list have been implemented and other have been more or less
> dropped.
> Also new persons who want to improve Gelly have joint the community while
> others have become idle.
> So from my point of view it makes perfect sense to gather the input of the
> community and update the Gelly roadmap.
>
> Regarding the specific changes of FLINK-3772, I am usually always in favor
> of performance improvements.
> How much would the suggested functionality for degree computation overlaps
> with the current methods?
> Could the current methods be replaced by the more efficient implementations
> or would there be two methods which look very similar and behave almost the
> same?
>
> Best, Fabian
>
>
>
> 2016-04-22 11:00 GMT+02:00 Vasiliki Kalavri <vasilikikalavri@gmail.com>:
>
> > Hi all,
> >
> > I asked Greg to start a discussion here about FLINK-3771 and FLINK-3772,
> to
> > make sure we're all on the same page about future Gelly development.
> >
> > About a year ago we created the Gelly roadmap [1]. Many of these items
> have
> > been implemented and others were researched and either developed
> externally
> > or dropped. Graph translators and degree annotations were not in the
> > roadmap, but I personally think that they are both helpful features and
> I'm
> > in favor of adding them to Gelly.
> >
> > That said, I find this a perfect timing for revisiting and updating our
> > Gelly development plans. We should update the roadmap so that it reflects
> > current state and future plans. This way, the community can have a clear
> > picture of what we are working on and what are upcoming features. This is
> > also very important for new contributors. They can always refer to the
> > roadmap to see what the community is interested in and we can avoid
> having
> > people spending their time on obsolete or dropped features. We should
> also
> > go through existing JIRAs and clean them up. Several are assigned to
> people
> > who have been inactive or might need help.
> >
> > In the following days I will go through the roadmap and then start a new
> > thread where we can discuss features and agree on future plans. In the
> > meantime, it would be great if you give us your view on FLINK-3771 and
> > FLINK-3772.
> >
> > Cheers,
> > -Vasia.
> >
> > [1]: https://cwiki.apache.org/confluence/display/FLINK/Flink+Gelly
> >
> > On 21 April 2016 at 18:52, Greg Hogan <code@greghogan.com> wrote:
> >
> > > Vasia and I are looking for additional feedback on FLINK-3772. This
> > ticket
> > > [0] and PR [1] provides a set of graph algorithms which compute and
> store
> > > the degree for vertices and edges.
> > >
> > > Degree annotation is a basic component of many algorithms. For example,
> > > PageRank requires the vertex out-degree to evenly divide the vertex
> score
> > > among neighbors. The triangle enumeration algorithm in Gelly requires
> > both
> > > source and target degree for each edge as an optimization. The Jaccard
> > and
> > > Adamic-Adar similarities require the target degree for each edge.
> > >
> > > As discussed in the ticket, Graph has methods for outDegrees,
> inDegrees,
> > > and getDegrees. These are simple but difficult or impossible to use
> > > efficiently.
> > >
> > > Stepping back, I believe algorithms should be composable and reused
> where
> > > possible, not only to improve Flink but also to support users.
> > Implementing
> > > algorithms as classes rather than Graph methods enables customization
> and
> > > optimization such as used here.
> > >
> > > One such optimization is CachingGraphAlgorithm which implicitly reuses
> > > DataSets. There is overlap between algorithms. From this ticket,
> > annotating
> > > edges with source and target degree on an undirected graph can reuse
> > vertex
> > > degree. Local clustering coefficient requires a triangle listing and
> > global
> > > clustering coefficient requires a triangle count, there is no need to
> > > generate the list three times.
> > >
> > > Further optimizations include the use of mutable types, reusing sort
> > order,
> > > avoidance of coGroup, configurable parallelism, and not unnecessarily
> > > materializing DataSets. I see all this as the expectation for inclusion
> > in
> > > Flink.
> > >
> > > Greg
> > >
> > > [0] https://issues.apache.org/jira/browse/FLINK-3772
> > > [1] https://github.com/apache/flink/pull/1901/files
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message