[ https://issues.apache.org/jira/browse/GIRAPH818?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel
]
Yuanyuan Tian updated GIRAPH818:

Description:
This patch an extension based on my paper, From "Think Like a Vertex" to "Think Like a Graph",
published in PVLDB 2013 (http://researcher.watson.ibm.com/researcher/files/usytian/giraph++.pdf).
The basic motivation is as follows. Giraphs divides input graphs into partitions, and employs
a “think like a vertex" programming model to support iterative graph computation. This vertexcentric
model is easy to program and has been proved useful for many graph algorithms. However, this
model hides the partitioning information from the users, thus prevents many algorithmspecific
optimizations. This often results in longer execution time due to excessive network messages.
To address this limitation, we propose a new “think like a graph" programming paradigm.
Under this graphcentric model, the partition structure is opened up to the users, and can
be utilized so that communication within a partition can bypass the heavy message passing.
For example, on a web graph with 118 million vertices and 855 million edges, the graphcentric
version of connected component detection algorithm runs 63X faster and uses 204X fewer network
messages than its vertexcentric counterpart. For more details of this new model, please refer
to the paper.
Note that since the work is done last year, the extension is based on an old version of Giraph
downloaded in June 2012. So, the patch essential rolls back all the changes in Giraph since
June 2012. Performing a diff with a June 2012 version of Giraph will show what changes I made
to support the extension.
To compile the code, simply run “mvn compile”, then the jar file named giraph0.2SNAPSHOTjarwithdependencies.jar
is generated under the target directory. Also note that while I was creating the patch and
run “mvn clean verify”, testBspPageRank and testBspPageRankWithAggregatorWriter failed.
In this patch, I also included 4 example algorithms to test the new graphcentric model. To
testrun these examples, please download the two example graphs: enron.tgz (the enron email
graph) and enron_undirected.tgz (an undirected version of the enron graph). Unzip them and
put them on hdfs. Then these are the command to run the tests:
 compute the NCut amd Imbalande factor of a partitioning scheme
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.NCut
enron 7 true 7 output
 weakly connected component on an undirected graph
 the graphcentric model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.wcc.WCCGraph
enron_undirected out 7 true 7
the hybrid model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.wcc.WCCVertex
enron_undirected out 7 true 7 true
 the basic vertexcentric model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.wcc.WCCVertex
enron_undirected out 7 true 7 false
 pagerank on a directed graph
 the graphcentric model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.pagerank.DeltaPRGraph
enron output 7 true 7
 the hybrid model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.pagerank.DeltaPRVertex
enron output 7 true 7 true
 the vertexcentric model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.pagerank.DeltaPRVertex
enron output 7 true 7 false
 graph coarsening on an undirected graph
 the graphcentric model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.coarsen.CoarsenGraph
enron_undirected 7 output 7
 the vertexcentricl mdoel
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.coarsen.CoarsenVertex
enron_undirected 7 output 7
was:
This patch an extension based on my paper, From "Think Like a Vertex" to "Think Like a Graph",
published in PVLDB 2013 (http://researcher.watson.ibm.com/researcher/files/usytian/giraph++.pdf).
The basic motivation is as follows. Giraphs divides input graphs into partitions, and employs
a “think like a vertex" programming model to support iterative graph computation. This vertexcentric
model is easy to program and has been proved useful for many graph algorithms. However, this
model hides the partitioning information from the users, thus prevents many algorithmspecific
optimizations. This often results in longer execution time due to excessive network messages.
To address this limitation, we propose a new “think like a graph" programming paradigm.
Under this graphcentric model, the partition structure is opened up to the users, and can
be utilized so that communication within a partition can bypass the heavy message passing.
For example, on a web graph with 118 million vertices and 855 million edges, the graphcentric
version of connected component detection algorithm runs 63X faster and uses 204X fewer network
messages than its vertexcentric counterpart. For more details of this new model, please refer
to the paper.
Note that since the work is done last year, the extension is based on an old version of Giraph
downloaded in June 2012. So, the patch essential rolls back all the changes in Giraph since
June 2012. Performing a diff with a June 2012 version of Giraph will show what changes I made
to support the extension.
To compile the code, simply run “mvn compile”, then the jar file named giraph0.2SNAPSHOTjarwithdependencies.jar
is generated under the target directory. Also note that while I was creating the patch and
run “mvn clean verify”, testBspPageRank and testBspPageRankWithAggregatorWriter failed.
In this patch, I also included 4 example algorithms to test the new graphcentric model. To
testrun these examples, please download the two example graphs: enron.tgz (the enron email
graph) and enron_undirected.tgz (an undirected version of the enron graph). Unzip them and
put them on hdfs. Then these are the command to run the tests:
# compute the NCut amd Imbalande factor of a partitioning scheme
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.NCut
enron 7 true 7 output
# weakly connected component on an undirected graph
# the graphcentric model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.wcc.WCCGraph
enron_undirected out 7 true 7
# the hybrid model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.wcc.WCCVertex
enron_undirected out 7 true 7 true
# the basic vertexcentric model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.wcc.WCCVertex
enron_undirected out 7 true 7 false
# pagerank on a directed graph
# the graphcentric model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.pagerank.DeltaPRGraph
enron output 7 true 7
# the hybrid model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.pagerank.DeltaPRVertex
enron output 7 true 7 true
# the vertexcentric model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.pagerank.DeltaPRVertex
enron output 7 true 7 false
# graph coarsening on an undirected graph
# the graphcentric model
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.coarsen.CoarsenGraph
enron_undirected 7 output 7
# the vertexcentricl mdoel
hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.coarsen.CoarsenVertex
enron_undirected 7 output 7
> Extending Giraph with a GraphCentric Programming API
> 
>
> Key: GIRAPH818
> URL: https://issues.apache.org/jira/browse/GIRAPH818
> Project: Giraph
> Issue Type: Improvement
> Components: benchmark
> Affects Versions: 1.1.0
> Reporter: Yuanyuan Tian
> Labels: patch
> Fix For: 1.1.0
>
> Attachments: GIRAPH798.patch, enron.tgz, enron_undirected.tgz
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> This patch an extension based on my paper, From "Think Like a Vertex" to "Think Like
a Graph", published in PVLDB 2013 (http://researcher.watson.ibm.com/researcher/files/usytian/giraph++.pdf).
> The basic motivation is as follows. Giraphs divides input graphs into partitions, and
employs a “think like a vertex" programming model to support iterative graph computation.
This vertexcentric model is easy to program and has been proved useful for many graph algorithms.
However, this model hides the partitioning information from the users, thus prevents many
algorithmspecific optimizations. This often results in longer execution time due to excessive
network messages. To address this limitation, we propose a new “think like a graph" programming
paradigm. Under this graphcentric model, the partition structure is opened up to the users,
and can be utilized so that communication within a partition can bypass the heavy message
passing. For example, on a web graph with 118 million vertices and 855 million edges, the
graphcentric version of connected component detection algorithm runs 63X faster and uses
204X fewer network messages than its vertexcentric counterpart. For more details of this
new model, please refer to the paper.
> Note that since the work is done last year, the extension is based on an old version
of Giraph downloaded in June 2012. So, the patch essential rolls back all the changes in Giraph
since June 2012. Performing a diff with a June 2012 version of Giraph will show what changes
I made to support the extension.
> To compile the code, simply run “mvn compile”, then the jar file named giraph0.2SNAPSHOTjarwithdependencies.jar
is generated under the target directory. Also note that while I was creating the patch and
run “mvn clean verify”, testBspPageRank and testBspPageRankWithAggregatorWriter failed.
> In this patch, I also included 4 example algorithms to test the new graphcentric model.
To testrun these examples, please download the two example graphs: enron.tgz (the enron email
graph) and enron_undirected.tgz (an undirected version of the enron graph). Unzip them and
put them on hdfs. Then these are the command to run the tests:
>  compute the NCut amd Imbalande factor of a partitioning scheme
> hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.NCut
enron 7 true 7 output
>  weakly connected component on an undirected graph
>  the graphcentric model
> hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.wcc.WCCGraph
enron_undirected out 7 true 7
> the hybrid model
> hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.wcc.WCCVertex
enron_undirected out 7 true 7 true
>  the basic vertexcentric model
> hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.wcc.WCCVertex
enron_undirected out 7 true 7 false
>  pagerank on a directed graph
>  the graphcentric model
> hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.pagerank.DeltaPRGraph
enron output 7 true 7
>  the hybrid model
> hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.pagerank.DeltaPRVertex
enron output 7 true 7 true
>  the vertexcentric model
> hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.pagerank.DeltaPRVertex
enron output 7 true 7 false
>  graph coarsening on an undirected graph
>  the graphcentric model
> hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.coarsen.CoarsenGraph
enron_undirected 7 output 7
>  the vertexcentricl mdoel
> hadoop0.20.203.0/bin/hadoop jar giraph0.2SNAPSHOTjarwithdependencies.jar com.ibm.giraph.graph.example.coarsen.CoarsenVertex
enron_undirected 7 output 7

This message was sent by Atlassian JIRA
(v6.1.4#6159)
