flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Kiefer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1552) Allow secondary sorts in Vertex Centric Iteration
Date Wed, 18 Feb 2015 11:46:12 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325769#comment-14325769

Martin Kiefer commented on FLINK-1552:

I worked on Approximate Maximum Weight Watchings. Optimal Min/Max Weight Matchings are usually
found by variants of the Blossom algorithm, however, the algorithm parallelizes badly. You
can obtain a 1/2-approximation with less complex algorithms.

Salihogulu and Widom proposed several graph algorithms in their paper "Optimizing Graph Algorithms
on Pregel-like Systems" at VLDB 2014. Among these algorithms was a scalable variant of an
Approximate Maximum Weight Matching Algorithm. http://ilpubs.stanford.edu:8090/1077/3/p535-salihoglu.pdf

I implemented it with Gelly in the plain version presented in the paper. Additionally, we
implemented [Flink-1515], so we could provide an implementation with the optimization the
authors called "Edge Cleaning on Demand (ECOD)". 
I also dirtily implemented a SortedVertexCentricIteration that does adress this issue and
provided two additionaly variants making use of secondary sorts.

However, one could argue that this is not the most beautiful algorithm for an implementation
with Gelly. The algorithm requires you to find the maximum weight edge of vertices and somehow
provide them in the next update step. You also need to do something that is equivalent to
removing edges from vertices. So, you have to choose between two options of wich either one
kind of lacks beauty:

1. Store all Edges in the VertexValue
We then can find the maximum vertex value in the update step and store it in the vertex value.
We can easily remove edges from the vertex value.
This blows up the the VertexValue right from the beginning and makes the messaging CoGroup
in a VertexCentricIteration senseless.

2. Store the VertexKeys of all removed edges in the VertexValue + self messaging
We can find the maximum vertex value in the messaging step and the vertex can send a message
to itself to remember its decision. This hopefully has low cost because the message should
not have to go over the network. We only store the vertex keys for deleted edges in the vertex
state so we can ignore them in the messaging step.

We chose the latter option. 

If you are nevertheless interested in this algorithm I can give you access to the code so
you can have a look at it.

> Allow secondary sorts in Vertex Centric Iteration
> -------------------------------------------------
>                 Key: FLINK-1552
>                 URL: https://issues.apache.org/jira/browse/FLINK-1552
>             Project: Flink
>          Issue Type: Wish
>          Components: Gelly
>            Reporter: Martin Kiefer
>            Priority: Minor
> The `VertexCentricIteration` class holds the logic to transform a `VertexUpdateFunction`
and a `MessagingFunction` into an iteration with two CoGroup operators working on the set
of messages and edges. Graph algorithms can profit from implying an order on the edges or
messages based on their value and/or the vertex ID. This can be implemented easily making
use of secondary sorts. I would suggest extending the `VertexCentricIteration` to allow to
specify these kind of orderings optionally.
> For example, this comes handy when it is necessary to find the edges with the minimum
or maximum value or the algorithm requires to pick the edge with lower vertex ID for edges
with equal value. Similar use cases might be found for orders on the messages. 

This message was sent by Atlassian JIRA

View raw message