giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuanyuan Tian (JIRA)" <j...@apache.org>
Subject [jira] [Created] (GIRAPH-818) Extending Giraph with a Graph-Centric Programming API
Date Fri, 13 Dec 2013 18:41:08 GMT
Yuanyuan Tian created GIRAPH-818:
------------------------------------

             Summary: Extending Giraph with a Graph-Centric Programming API
                 Key: GIRAPH-818
                 URL: https://issues.apache.org/jira/browse/GIRAPH-818
             Project: Giraph
          Issue Type: Improvement
          Components: benchmark
    Affects Versions: 1.1.0
            Reporter: Yuanyuan Tian
             Fix For: 1.1.0


This patch an extension based on my paper, From "Think Like a Vertex" to "Think Like a Graph",
published in PVLDB 2013 (http://researcher.watson.ibm.com/researcher/files/us-ytian/giraph++.pdf).


The basic motivation is as follows. Giraphs divides input graphs into partitions, and employs
a “think like a vertex" programming model to support iterative graph computation. This vertex-centric
model is easy to program and has been proved useful for many graph algorithms. However, this
model hides the partitioning information from the users, thus prevents many algorithm-specific
optimizations. This often results in longer execution time due to excessive network messages.
To address this limitation, we propose a new “think like a graph" programming paradigm.
Under this graph-centric model, the partition structure is opened up to the users, and can
be utilized so that communication within a partition can bypass the heavy message passing.
For example, on a web graph with 118 million vertices and 855 million edges, the graph-centric
version of connected component detection algorithm runs 63X faster and uses 204X fewer network
messages than its vertex-centric counterpart. For more details of this new model, please refer
to the paper.

Note that since the work is done last year, the extension is based on an old version of Giraph
downloaded in June 2012. So, the patch essential rolls back all the changes in Giraph since
June 2012. Performing a diff with a June 2012 version of Giraph will show what changes I made
to support the extension.

To compile the code, simply run “mvn compile”, then the jar file named giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
is generated under the target directory. Also note that while I was creating the patch and
run “mvn clean verify”, testBspPageRank and testBspPageRankWithAggregatorWriter failed.

In this patch, I also included 4 example algorithms to test the new graph-centric model. To
test-run these examples, please download the two example graphs: enron.tgz (the enron email
graph) and enron_undirected.tgz (an undirected version of the enron graph). Unzip them and
put them on hdfs. Then these are the command to run the tests:

# compute the NCut amd Imbalande factor of a partitioning scheme
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar com.ibm.giraph.graph.example.NCut
enron 7 true 7 output

# weakly connected component on an undirected graph
# the graph-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar com.ibm.giraph.graph.example.wcc.WCCGraph
enron_undirected out 7 true 7
# the hybrid model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar com.ibm.giraph.graph.example.wcc.WCCVertex
enron_undirected out 7 true 7  true
# the basic vertex-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar com.ibm.giraph.graph.example.wcc.WCCVertex
enron_undirected out 7 true 7  false

# pagerank on a directed graph
# the graph-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar com.ibm.giraph.graph.example.pagerank.DeltaPRGraph
enron output 7 true 7
# the hybrid model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar com.ibm.giraph.graph.example.pagerank.DeltaPRVertex
enron output 7 true 7 true
# the vertex-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar com.ibm.giraph.graph.example.pagerank.DeltaPRVertex
enron output 7 true 7 false

# graph coarsening on an undirected graph
# the graph-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar com.ibm.giraph.graph.example.coarsen.CoarsenGraph
enron_undirected 7 output 7
# the vertex-centricl mdoel
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar com.ibm.giraph.graph.example.coarsen.CoarsenVertex
enron_undirected 7 output 7




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message