Return-Path: X-Original-To: apmail-incubator-giraph-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-giraph-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF27892CD for ; Sat, 24 Dec 2011 22:31:54 +0000 (UTC) Received: (qmail 27437 invoked by uid 500); 24 Dec 2011 22:31:54 -0000 Delivered-To: apmail-incubator-giraph-dev-archive@incubator.apache.org Received: (qmail 27390 invoked by uid 500); 24 Dec 2011 22:31:54 -0000 Mailing-List: contact giraph-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: giraph-dev@incubator.apache.org Delivered-To: mailing list giraph-dev@incubator.apache.org Received: (qmail 27382 invoked by uid 99); 24 Dec 2011 22:31:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Dec 2011 22:31:54 +0000 X-ASF-Spam-Status: No, hits=-2002.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Dec 2011 22:31:53 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 8544C1270E8 for ; Sat, 24 Dec 2011 22:31:32 +0000 (UTC) Date: Sat, 24 Dec 2011 22:31:32 +0000 (UTC) From: "jiraposter@reviews.apache.org (Commented) (JIRA)" To: giraph-dev@incubator.apache.org Message-ID: <564384448.44326.1324765892547.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <776280181.43966.1324718550683.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (GIRAPH-115) Port of the HCC algorithm for identifying all connected components of a graph MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/GIRAPH-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175780#comment-13175780 ] jiraposter@reviews.apache.org commented on GIRAPH-115: ------------------------------------------------------ ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3313/#review4114 ----------------------------------------------------------- Sebastian, this is really awesome work, thanks for sharing it! While I didn't read the paper, your code looks good and the compute() code is pretty straightforward. IntIntNullIntVertex.java is a good example of how to make a very compact vertex. I only have a few minor formatting requests. In the CODE_CONVENTIONS, comments should be: - All classes, members, and member methods should have Javadoc in the following style. C-style comments for javadoc and // comments for non-javadoc. Also, the comment block should have a line break that separates the comment section and the @ section. While not in the CODE_CONVENTIONS, but should be, Giraph follows spaces in the <>, i.e. . Can you please add spaces i.e. ? I marked as many as I could find, please correct any others I have missed. /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java CODE_CONVENTIONS comment suggestion. /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java You could shorten this a tad with the foreach pattern instead. /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java CODE_CONVENTIONS comment suggestion. /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java Could be foreach (again). /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java CODE_CONVENTIONS /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java CODE_CONVENTIONS /trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java CODE_CONVENTIONS /trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java CODE_CONVENTIONS /trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java CODE_CONVENTIONS /trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java CODE_CONVENTIONS /trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java CODE_CONVENTIONS /trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java CODE_CONVENTIONS /trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java CODE_CONVENTIONS /trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java Capital 'text' /trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java NullWritable,IntWritable should be NullWritable, IntWritable /trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java String,String -> String, String - Avery On 2011-12-24 09:32:15, Sebastian Schelter wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3313/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-12-24 09:32:15) bq. bq. bq. Review request for giraph. bq. bq. bq. Summary bq. ------- bq. bq. Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest vertex id in its component. bq. bq. I created a very memory-efficient abstract vertex in org.apache.giraph.graph.IntIntNullIntVertex and had org.apache.giraph.examples.ConnectedComponentsVertex extend that. org.apache.giraph.examples.ConnectedComponentsVertexTest contains an "integration" test on toy data. bq. bq. I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use of combiners and to shutdown() the local zookeeper instance in the tests. bq. bq. Local and pseudo-distributed unit tests were passed. I also tested the algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph (5.7M vertices, 130M edges). bq. bq. bq. This addresses bug GIRAPH-115. bq. https://issues.apache.org/jira/browse/GIRAPH-115 bq. bq. bq. Diffs bq. ----- bq. bq. /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java PRE-CREATION bq. /trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java PRE-CREATION bq. /trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java PRE-CREATION bq. /trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java PRE-CREATION bq. /trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java PRE-CREATION bq. /trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1222837 bq. /trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java PRE-CREATION bq. /trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java PRE-CREATION bq. /trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3313/diff bq. bq. bq. Testing bq. ------- bq. bq. bq. Thanks, bq. bq. Sebastian bq. bq. > Port of the HCC algorithm for identifying all connected components of a graph > ----------------------------------------------------------------------------- > > Key: GIRAPH-115 > URL: https://issues.apache.org/jira/browse/GIRAPH-115 > Project: Giraph > Issue Type: New Feature > Affects Versions: 0.70.0 > Reporter: Sebastian Schelter > > Port of the HCC algorithm that identifies connected components and assigns a componented id (the smallest vertex id in the component) to each vertex. > The idea behind the algorithm is very simple: propagate the smallest vertex id along the edges to all vertices of a connected component until convergence. The number of supersteps necessary is equal to the length of the maximum diameter of all components + 1 > The original Hadoop-based variant of this algorithm was proposed by Kang, Charalampos, Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale Graphs", 2010 > http://www.cs.cmu.edu/~ukang/papers/PegasusKAIS.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira