flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andra Lungu <lungu.an...@gmail.com>
Subject [Proposal] Addition to Gelly
Date Mon, 10 Aug 2015 08:47:06 GMT

Before actually opening a PR, I wanted to hear your opinion. So, here goes
nothing :).

I'd like to add the core of my master thesis to Gelly. That is, a series of
operators that take a skewed graph, split its high degree vertices into
subvertices and redistribute the edges accordingly (thus producing the
result faster due to the increased parallelism); then merge the subvertices
into the initial vertex.

Here's part of my -unrevised- abstract:
"The Twitter follower graph, the citation network and general purpose
social networks follow a power law degree distribution. These highly skewed
graphs raise challenges to current computation models which uniformly
process vertices, leading to communication overhead or large execution time

In spite of efforts made to scale up computation on natural graphs
(PowerGraph ’s Gather - Apply - Scatter model), many performance problems
raise from a subset of vertices that have high degrees. In this thesis, we
outline the main processing issues that arise in current graph systems. We
then propose a novel node splitting technique meant to differentiate the
computation as well as partitioning strategies for low and high degree

The “Node Splitting” abstraction is implemented as a separate set of
operators that can be seamlessly combined with current degree-oblivious
models such as Pregel and PowerGraph, but also with degree-aware engines
such as Pregel+ and PowerLyra. Our solution introduces minimal overhead in
the absence of skew and improves the naive implementation of a subset of
computational intensive algorithms by a factor of two.

These results have been proven on a theoretical and practical level on both
real world and synthetic graphs."

What I would add:
1) the operators per se in a separate package
2) a slightly modified example (e.g. take Jaccard Similarity and apply
these operators on it)
3). tests
4). a separate section in the gelly docs that explains how to modify your
code to use this solution.

Needless to say I will maintain the code and, if, as we get more users,
they deem this sub-API useless, we can always deprecate and delete it :).

So, what do you think?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message