Return-Path: X-Original-To: apmail-incubator-giraph-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-giraph-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 173E1857A for ; Sat, 10 Sep 2011 01:05:31 +0000 (UTC) Received: (qmail 13866 invoked by uid 500); 10 Sep 2011 01:05:31 -0000 Delivered-To: apmail-incubator-giraph-dev-archive@incubator.apache.org Received: (qmail 13825 invoked by uid 500); 10 Sep 2011 01:05:30 -0000 Mailing-List: contact giraph-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: giraph-dev@incubator.apache.org Delivered-To: mailing list giraph-dev@incubator.apache.org Received: (qmail 13810 invoked by uid 99); 10 Sep 2011 01:05:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Sep 2011 01:05:30 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Sep 2011 01:05:29 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 047538B0BD for ; Sat, 10 Sep 2011 01:05:09 +0000 (UTC) Date: Sat, 10 Sep 2011 01:05:09 +0000 (UTC) From: "Avery Ching (JIRA)" To: giraph-dev@incubator.apache.org Message-ID: <1198027686.10790.1315616709015.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <336462960.2057.1314587260127.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101710#comment-13101710 ] Avery Ching commented on GIRAPH-11: ----------------------------------- Regarding the difference in hash based and hash rang based, it refers to how the hash code is assigned to a partition. The application dev will implement hashCode() for their vertex id and then the assignment of the hashCode() to a partition can be hashed (i.e. hashCode() % # partitions) or range based ([0-a),[a-b)...etc). Hope that's more clear. Code will help. It's coming soon, by mid next week I hope. > Improve the graph distribution of Giraph > ---------------------------------------- > > Key: GIRAPH-11 > URL: https://issues.apache.org/jira/browse/GIRAPH-11 > Project: Giraph > Issue Type: Improvement > Reporter: Avery Ching > Assignee: Avery Ching > > Currently, Giraph assumes that the data from the VertexInputFormat is sorted. If the user data is not sorted by the vertex id, they must first run a MapReduce or Pig job to generate a sorted dataset. This is often a bit inconvenient. > Giraph graph partitioning is currently range based and there are some advantages and disadvantages of this approach. The proposal of this JIRA would be to allow for both range and hash based partitioning and provide more flexibility to the user. > Design goals for the graph distribution: > * Allow vertices to be unordered or unordered > * Ability to repartition > * Select the partitioning scheme based on user needs (i.e. hash or range based) > * Ability to provide user-specific hints about partitions > Hash-based partitioning > * Good vertex balancing across ranges for random data > * Bad at vertex id locality > Range-based partitioning > * Good at vertex id locality > * Ability to split ranges easily > * Can cause hotspots for hot ranges -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira