flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andra Lungu <lungu.an...@gmail.com>
Subject Re: Design document for FLINK-2254
Date Thu, 22 Oct 2015 08:53:58 GMT
Hi Saumitra,

As you already noticed, the first version (with duplicates) is highly
inefficient and consumes a lot of memory. So, I suggest we drop it for now.
The version with the label makes a lot of modifications on the base Graph
class, and this, in my opinion would make it more difficult to grasp.
Bipartite Graphs are not as common as regular graphs. And then, when you
would add an Isomorphic Graph class (stupid example), you'll need to strip
stuff out again.

I believe we should go with a BipartiteGraph class which extends Graph and
which has a DataSet of topVertices and a DataSet of bottomVertices. Apart
from getTopVertices() methods, we would still need functions such as
getNumberOfTopVertices(). For the time being, just support the funstiones
mentiones as well as the basics: fromCollection, fromDataSet, etc, get*,
join*, map*, filter*.

Wait for a few more opinions and dig in. I believe we should discuss this
even further and propose Jiras for examples of algos for bipartite graphs.

Once we have the design specs well figured out, and if I find some time
I'll gladly chime in :)


On Wed, Oct 21, 2015 at 8:45 PM, Saumitra Shahapure <
saumitra.official@gmail.com> wrote:

> In FLINK-2254 <https://issues.apache.org/jira/browse/FLINK-2254> , we
> want to extend Graph API of Gelly by adding support for bipartite graphs
> too. In the long term, the support for newer graph types can be added in
> the same way. Also specialised algorithms for special types of graphs
> should be implemented.
> For bipartite graph, a new class BipartiteGraph can be implemented which
> extends Graph. Other graph algorithms which are written for Graph can be
> used for BipartiteGraph too, because of inheritance.
> Initialisation functions like  public static <K, VV, EV> Graph<K, VV, EV>
> fromCollection(Collection<Vertex<K, VV>> vertices,Collection<Edge<K,
> edges, ExecutionEnvironment context) need to be duplicated as public static
> <K, VV, EV> BipartiteGraph<K, VV, EV> fromCollection(Collection<Vertex<K,
> VV>> topVertices,Collection<Vertex<K, VV>> bottomVertices,
> Collection<Edge<K, EV>> edges, ExecutionEnvironment context)
> Graph validation does not need to happen implicitly. user can call
> BipartiteGraph.validate() in case he wants to check sanity of data.
> Vertex modes is primary requirement. BipartiteGraph should have functions,
> getTopVertices() and getBottomVertices(). There are three ways to implement
> it:
> Simplest and least efficient way is to maintain vertices variable in Graph
> in addition to two more Datasets topVertices, bottomVertices. Benefit of
> this approach is that Graph class would not need any modification at all.
> this.vertices variable access inside Graph class would work correctly.
> Disadvantage is that multiple copies of vertex dataset are created and
> maintained. So this is inefficient in terms of memory.
> Another way is to keep topVertices and bottomVertices variables in
> BipartiteGraph. vertices variable in Graph would stay empty. getVertices()
> function of Graph would be overloaded by BipartiteGraph and reimplemented
> as union of of topVertices and bottomVertices. Since, vertices is a private
> variable, external view of vertices stays unaffected. All functions of
> Graph class which use vertices local variable (e.g. getVerticesAsTuple2())
> need to use getVertices() instead of this.vertices. Disadvantage of this
> method is Graph.vertices variable would have invalid value throughout for
> BipartiteGraph.
> Another way is to ‘label’ vertices with an integer. So public class
> BipartiteGraph<K,VV,EV> extends Graph<K, Tuple2<VV,Integer>, EV> would
> the signature. Initialisers would tag the vertices according to their mode.
> getVertices() should be overridden to strip-off the tag values so that
> users would get consistent view of vertex dataset for Graph and
> BipartiteGraph. getTopVertices/getBottomVertices would be filter functions
> on vertex tags.
> I personally feel method 2 to be most elegant.
> Functions like addVertices(vertexList) are not relevant without specifying
> whether they are top or bottom. A possibility could be to add them to
> topVertices by default. And have overloaded function
> addVertices(vertexList, mode) to add them to specific partition.
> Specialised algorithms for Bipartite graphs can implement new
> BipartiteGraphAlgorithm interface. It would be similar to GraphAlgorithm.
> In future, newer types of graphs can be similarly derived from Graph and
> type hierarchy of Graph can be created. Class hierarchy would be better
> here rather than interface hierarchy, because the functions which are
> applicable to all derived classes can be implemented in the base class.
> As a part of future refactoring, Graph transformation functions should
> rather be a new class implementing GraphAlgorithm rather than function in
> Graph class. This may make implementing complex graph types easier.
> PS. Do I need to add it in wiki too? I can copy the contents to wiki if
> you say so,
> -- Saumitra Shahapure

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message