incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " (Commented) (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-115) Port of the HCC algorithm for identifying all connected components of a graph
Date Sun, 25 Dec 2011 09:38:32 GMT

] commented on GIRAPH-115:

This is an automatically generated e-mail. To reply, visit:

(Updated 2011-12-25 09:36:39.177630)

Review request for giraph.


Reworked code to reflect Avery's comments regarding the code conventions.


Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest vertex id in its

I created a very memory-efficient abstract vertex in org.apache.giraph.graph.IntIntNullIntVertex
and had org.apache.giraph.examples.ConnectedComponentsVertex extend that. org.apache.giraph.examples.ConnectedComponentsVertexTest
contains an "integration" test on toy data.

I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use of combiners
and to shutdown() the local zookeeper instance in the tests.

Local and pseudo-distributed unit tests were passed. I also tested the algorithm on a 6-machine
hadoop cluster using the wikipedia pagelink graph (5.7M vertices, 130M edges).

This addresses bug GIRAPH-115.

Diffs (updated)

  /trunk/src/main/java/org/apache/giraph/examples/ PRE-CREATION

  /trunk/src/main/java/org/apache/giraph/examples/ PRE-CREATION

  /trunk/src/main/java/org/apache/giraph/examples/ PRE-CREATION 
  /trunk/src/main/java/org/apache/giraph/graph/ PRE-CREATION 
  /trunk/src/main/java/org/apache/giraph/utils/ 1222837 
  /trunk/src/main/java/org/apache/giraph/utils/ PRE-CREATION

  /trunk/src/test/java/org/apache/giraph/examples/ PRE-CREATION

  /trunk/src/test/java/org/apache/giraph/examples/ PRE-CREATION

  /trunk/src/test/java/org/apache/giraph/examples/ 1222837





> Port of the HCC algorithm for identifying all connected components of a graph
> -----------------------------------------------------------------------------
>                 Key: GIRAPH-115
>                 URL:
>             Project: Giraph
>          Issue Type: New Feature
>    Affects Versions: 0.70.0
>            Reporter: Sebastian Schelter
> Port of the HCC algorithm that identifies connected components and assigns a componented
id (the smallest vertex id in the component) to each vertex.
> The idea behind the algorithm is very simple: propagate the smallest vertex id along
the edges to all vertices of a connected component until convergence. The number of supersteps
necessary is equal to the length of the maximum diameter of all components + 1
> The original Hadoop-based variant of this algorithm was proposed by Kang, Charalampos,
Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale Graphs", 2010

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message