Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DC18311893 for ; Thu, 24 Jul 2014 06:59:56 +0000 (UTC) Received: (qmail 57577 invoked by uid 500); 24 Jul 2014 06:59:56 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 57524 invoked by uid 500); 24 Jul 2014 06:59:56 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@spark.apache.org Delivered-To: mailing list dev@spark.apache.org Received: (qmail 57511 invoked by uid 99); 24 Jul 2014 06:59:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 06:59:55 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of xiaodi@sjtu.edu.cn designates 202.112.26.52 as permitted sender) Received: from [202.112.26.52] (HELO proxy01.sjtu.edu.cn) (202.112.26.52) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 06:59:48 +0000 Received: from proxy03.sjtu.edu.cn (unknown [202.121.179.33]) by proxy01.sjtu.edu.cn (Postfix) with ESMTP id 9603E26003D for ; Thu, 24 Jul 2014 14:59:26 +0800 (CST) Received: from localhost (localhost [127.0.0.1]) by proxy03.sjtu.edu.cn (Postfix) with ESMTP id 8C490260B3E for ; Thu, 24 Jul 2014 14:59:26 +0800 (GMT-8) X-Virus-Scanned: amavisd-new at Received: from proxy03.sjtu.edu.cn ([127.0.0.1]) by localhost (proxy03.sjtu.edu.cn [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3cXQgrhlrbk7 for ; Thu, 24 Jul 2014 14:59:26 +0800 (GMT-8) Received: from loca.ipads-lab.se.sjtu.edu.cn (unknown [202.120.40.82]) (Authenticated sender: xiaodi) by proxy03.sjtu.edu.cn (Postfix) with ESMTPSA id 7023D260A78 for ; Thu, 24 Jul 2014 14:59:26 +0800 (GMT-8) Message-ID: <53D0AECE.2070608@sjtu.edu.cn> Date: Thu, 24 Jul 2014 14:59:26 +0800 From: Larry Xiao User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: dev@spark.apache.org Subject: GraphX graph partitioning strategy Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Hi all, I'm implementing graph partitioning strategy for GraphX, learning from researches on graph computing. I have two questions: - a specific implement question: In current design, only vertex ID of src and dst are provided (PartitionStrategy.scala). And some strategies require knowledge about the graph (like degrees) and can consist more than one passes to finally produce the partition ID. So I'm changing the PartitionStrategy.getPartition API to provide more info, but I don't want to make it complex. (the current one looks very clean) - an open question: What advice would you give considering partitioning, considering the procedure Spark adopt on graph processing? Any advice is much appreciated. Best Regards, Larry Xiao Reference Bipartite-oriented Distributed Graph Partitioning for Big Learning. PowerLyra : Differentiated Graph Computation and Partitioning on Skewed Graphs