flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saumitra Shahapure <saumitra.offic...@gmail.com>
Subject Design document for FLINK-2254
Date Wed, 21 Oct 2015 17:45:59 GMT
In FLINK-2254 <https://issues.apache.org/jira/browse/FLINK-2254> , we want to extend
Graph API of Gelly by adding support for bipartite graphs too. In the long term, the support
for newer graph types can be added in the same way. Also specialised algorithms for special
types of graphs should be implemented.

For bipartite graph, a new class BipartiteGraph can be implemented which extends Graph. Other
graph algorithms which are written for Graph can be used for BipartiteGraph too, because of
inheritance. 
Initialisation functions like  public static <K, VV, EV> Graph<K, VV, EV> fromCollection(Collection<Vertex<K,
VV>> vertices,Collection<Edge<K, EV>> edges, ExecutionEnvironment context)
need to be duplicated as public static <K, VV, EV> BipartiteGraph<K, VV, EV> fromCollection(Collection<Vertex<K,
VV>> topVertices,Collection<Vertex<K, VV>> bottomVertices, Collection<Edge<K,
EV>> edges, ExecutionEnvironment context)
Graph validation does not need to happen implicitly. user can call BipartiteGraph.validate()
in case he wants to check sanity of data.
Vertex modes is primary requirement. BipartiteGraph should have functions, getTopVertices()
and getBottomVertices(). There are three ways to implement it:
Simplest and least efficient way is to maintain vertices variable in Graph in addition to
two more Datasets topVertices, bottomVertices. Benefit of this approach is that Graph class
would not need any modification at all. this.vertices variable access inside Graph class would
work correctly. Disadvantage is that multiple copies of vertex dataset are created and maintained.
So this is inefficient in terms of memory.
Another way is to keep topVertices and bottomVertices variables in BipartiteGraph. vertices
variable in Graph would stay empty. getVertices() function of Graph would be overloaded by
BipartiteGraph and reimplemented as union of of topVertices and bottomVertices. Since, vertices
is a private variable, external view of vertices stays unaffected. All functions of Graph
class which use vertices local variable (e.g. getVerticesAsTuple2()) need to use getVertices()
instead of this.vertices. Disadvantage of this method is Graph.vertices variable would have
invalid value throughout for BipartiteGraph.
Another way is to ‘label’ vertices with an integer. So public class BipartiteGraph<K,VV,EV>
extends Graph<K, Tuple2<VV,Integer>, EV> would be the signature. Initialisers
would tag the vertices according to their mode. getVertices() should be overridden to strip-off
the tag values so that users would get consistent view of vertex dataset for Graph and BipartiteGraph.
getTopVertices/getBottomVertices would be filter functions on vertex tags.
I personally feel method 2 to be most elegant.
Functions like addVertices(vertexList) are not relevant without specifying whether they are
top or bottom. A possibility could be to add them to topVertices by default. And have overloaded
function addVertices(vertexList, mode) to add them to specific partition.
Specialised algorithms for Bipartite graphs can implement new BipartiteGraphAlgorithm interface.
It would be similar to GraphAlgorithm.
In future, newer types of graphs can be similarly derived from Graph and type hierarchy of
Graph can be created. Class hierarchy would be better here rather than interface hierarchy,
because the functions which are applicable to all derived classes can be implemented in the
base class.
As a part of future refactoring, Graph transformation functions should rather be a new class
implementing GraphAlgorithm rather than function in Graph class. This may make implementing
complex graph types easier.
PS. Do I need to add it in wiki too? I can copy the contents to wiki if you say so,
-- Saumitra Shahapure


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message