Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 574BE200BEE for ; Fri, 16 Dec 2016 11:24:41 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 56181160AF6; Fri, 16 Dec 2016 10:24:41 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 82B49160B24 for ; Fri, 16 Dec 2016 11:24:40 +0100 (CET) Received: (qmail 33910 invoked by uid 500); 16 Dec 2016 10:24:39 -0000 Mailing-List: contact commits-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list commits@flink.apache.org Received: (qmail 33901 invoked by uid 99); 16 Dec 2016 10:24:39 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Dec 2016 10:24:39 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 72E78E36D9; Fri, 16 Dec 2016 10:24:39 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: vasia@apache.org To: commits@flink.apache.org Date: Fri, 16 Dec 2016 10:24:40 -0000 Message-Id: In-Reply-To: <9fb7238f23784dcd8c37ce05c0a0b595@git.apache.org> References: <9fb7238f23784dcd8c37ce05c0a0b595@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [2/2] flink git commit: [FLINK-5311] [gelly] [docs] Add user documentation for bipartite graph archived-at: Fri, 16 Dec 2016 10:24:41 -0000 [FLINK-5311] [gelly] [docs] Add user documentation for bipartite graph This closes #2984 Project: http://git-wip-us.apache.org/repos/asf/flink/repo Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/3d41f2b8 Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/3d41f2b8 Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/3d41f2b8 Branch: refs/heads/master Commit: 3d41f2b821dc1c7b6496756f037338d2069f9639 Parents: 88e458b Author: Ivan Mushketyk Authored: Sat Dec 10 17:41:30 2016 +0000 Committer: vasia Committed: Fri Dec 16 10:42:46 2016 +0100 ---------------------------------------------------------------------- docs/dev/libs/gelly/bipartite_graph.md | 185 ++++++++++++++++++++++++++ docs/dev/libs/gelly/index.md | 1 + docs/fig/bipartite_graph_projections.png | Bin 0 -> 49335 bytes 3 files changed, 186 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flink/blob/3d41f2b8/docs/dev/libs/gelly/bipartite_graph.md ---------------------------------------------------------------------- diff --git a/docs/dev/libs/gelly/bipartite_graph.md b/docs/dev/libs/gelly/bipartite_graph.md new file mode 100644 index 0000000..ac57e3b --- /dev/null +++ b/docs/dev/libs/gelly/bipartite_graph.md @@ -0,0 +1,185 @@ +--- +title: Bipartite Graph +nav-parent_id: graphs +nav-pos: 6 +--- + + +Attention Bipartite Graph currently only supported in Gelly Java API. + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--------------- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. An edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect two vertices in the same set. + +These graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally an edge between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graphs are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): + * They preserve more information about a connection between vertices. For example instead of a single link between two researchers in a graph that represents that they authored a paper together a bipartite graph preserves the information about what papers they authored + * Bipartite graphs can encode the same information more compactly than one-mode graphs + + + +Graph Representation +-------------------- + +A `BipartiteGraph` is represented by: + * A `DataSet` of top nodes + * A `DataSet` of bottom nodes + * A `DataSet` of edges between top and bottom nodes + +As in the `Graph` class nodes are represented by the `Vertex` type and the same rules apply to its types and values. + +The graph edges are represented by the `BipartiteEdge` type. A `BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom ID (the ID of the bottom `Vertex`) and an optional value. The main difference between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of different types. Edges with no value have a `NullValue` value type. + +
+
+{% highlight java %} +BipartiteEdge e = new BipartiteEdge(1L, "id1", 0.5); + +Double weight = e.getValue(); // weight = 0.5 +{% endhighlight %} +
+ +
+{% highlight scala %} +// Scala API is not yet supported +{% endhighlight %} +
+
+{% top %} + + +Graph Creation +-------------- + +You can create a `BipartiteGraph` in the following ways: + +* from a `DataSet` of top vertices, a `DataSet` of bottom vertices and a `DataSet` of edges: + +
+
+{% highlight java %} +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +DataSet> topVertices = ... + +DataSet> bottomVertices = ... + +DataSet> edges = ... + +Graph graph = BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env); +{% endhighlight %} +
+ +
+{% highlight scala %} +// Scala API is not yet supported +{% endhighlight %} +
+
+ + +Graph Transformations +--------------------- + + +* Projection: Projection is a common operation for bipartite graphs that converts a bipartite graph into a regular graph. There are two types of projections: top and bottom projections. Top projection preserves only top nodes in the result graph and creates a link between them in a new graph only if there is an intermediate bottom node both top nodes connect to in the original graph. Bottom projection is the opposite to top projection, i.e. only preserves bottom nodes and connects a pair of nodes if they are connected in the original graph. + +

+ Bipartite Graph Projections +

+ +Gelly supports two sub-types of projections: simple projections and full projections. The only difference between them is what data is associated with edges in the result graph. + +In the case of a simple projection each node in the result graph contains a pair of values of bipartite edges that connect nodes in the original graph: + +
+
+{% highlight java %} +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); +// Vertices (1, "top1") +DataSet> topVertices = ... + +// Vertices (2, "bottom2"); (4, "bottom4") +DataSet> bottomVertices = ... + +// Edge that connect vertex 2 to vertex 1 and vertex 4 to vertex 1: +// (1, 2, "1-2-edge"); (1, 4, "1-4-edge") +DataSet> edges = ... + +BipartiteGraph graph = BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env); + +// Result graph with two vertices: +// (2, "bottom2"); (4, "bottom4") +// +// and one edge that contains ids of bottom edges and a tuple with +// values of intermediate edges in the original bipartite graph: +// (2, 4, ("1-2-edge", "1-4-edge")) +Graph> graph bipartiteGraph.projectionBottomSimple(); + +{% endhighlight %} +
+ +
+{% highlight scala %} +// Scala API is not yet supported +{% endhighlight %} +
+
+ +Full projection preserves all the information about the connection between two vertices and stores it in `Projection` instances. This includes value and id of an intermediate vertex, source and target vertex values and source and target edge values: + +
+
+{% highlight java %} +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); +// Vertices (1, "top1") +DataSet> topVertices = ... + +// Vertices (2, "bottom2"); (4, "bottom4") +DataSet> bottomVertices = ... + +// Edge that connect vertex 2 to vertex 1 and vertex 4 to vertex 1: +// (1, 2, "1-2-edge"); (1, 4, "1-4-edge") +DataSet> edges = ... + +BipartiteGraph graph = BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env); + +// Result graph with two vertices: +// (2, "bottom2"); (4, "bottom4") +// and one edge that contains ids of bottom edges and a tuple that +// contains id and value of the intermediate edge, values of connected vertices +// and values of intermediate edges in the original bipartite graph: +// (2, 4, (1, "top1", "bottom2", "bottom4", "1-2-edge", "1-4-edge")) +Graph> graph bipartiteGraph.projectionBottomFull(); + +{% endhighlight %} +
+ +
+{% highlight scala %} +// Scala API is not yet supported +{% endhighlight %} +
+
http://git-wip-us.apache.org/repos/asf/flink/blob/3d41f2b8/docs/dev/libs/gelly/index.md ---------------------------------------------------------------------- diff --git a/docs/dev/libs/gelly/index.md b/docs/dev/libs/gelly/index.md index 0877e2f..6bcdc82 100644 --- a/docs/dev/libs/gelly/index.md +++ b/docs/dev/libs/gelly/index.md @@ -33,6 +33,7 @@ Gelly is a Graph API for Flink. It contains a set of methods and utilities which * [Library Methods](library_methods.html) * [Graph Algorithms](graph_algorithms.html) * [Graph Generators](graph_generators.html) +* [Bipartite Graphs](bipartite_graph.html) Using Gelly ----------- http://git-wip-us.apache.org/repos/asf/flink/blob/3d41f2b8/docs/fig/bipartite_graph_projections.png ---------------------------------------------------------------------- diff --git a/docs/fig/bipartite_graph_projections.png b/docs/fig/bipartite_graph_projections.png new file mode 100644 index 0000000..e3be4ec Binary files /dev/null and b/docs/fig/bipartite_graph_projections.png differ