Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4DAA317BE9 for ; Thu, 25 Sep 2014 16:41:46 +0000 (UTC) Received: (qmail 11059 invoked by uid 500); 25 Sep 2014 16:41:43 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 10982 invoked by uid 500); 25 Sep 2014 16:41:43 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 10071 invoked by uid 99); 25 Sep 2014 16:41:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Sep 2014 16:41:42 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of xiaodi@sjtu.edu.cn designates 202.112.26.52 as permitted sender) Received: from [202.112.26.52] (HELO proxy01.sjtu.edu.cn) (202.112.26.52) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Sep 2014 16:41:35 +0000 Received: from proxy03.sjtu.edu.cn (unknown [202.121.179.33]) by proxy01.sjtu.edu.cn (Postfix) with ESMTP id F24552600C0; Fri, 26 Sep 2014 00:41:12 +0800 (CST) Received: from localhost (localhost [127.0.0.1]) by proxy03.sjtu.edu.cn (Postfix) with ESMTP id E3473260E14; Fri, 26 Sep 2014 00:41:12 +0800 (GMT-8) X-Virus-Scanned: amavisd-new at Received: from proxy03.sjtu.edu.cn ([127.0.0.1]) by localhost (proxy03.sjtu.edu.cn [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aW86g6Pk5PJO; Fri, 26 Sep 2014 00:41:12 +0800 (GMT-8) Received: from Loca.local (unknown [59.78.3.8]) (Authenticated sender: xiaodi) by proxy03.sjtu.edu.cn (Postfix) with ESMTPSA id B3F01260BFA; Fri, 26 Sep 2014 00:41:12 +0800 (GMT-8) Message-ID: <542445A8.3000005@sjtu.edu.cn> Date: Fri, 26 Sep 2014 00:41:12 +0800 From: Larry Xiao User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: dev@spark.apache.org, user@spark.apache.org CC: xiaodi@sjtu.edu.cn Subject: VertexRDD partition imbalance Content-Type: multipart/alternative; boundary="------------010807040009060509060807" X-Virus-Checked: Checked by ClamAV on apache.org --------------010807040009060509060807 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi all VertexRDD is partitioned with HashPartitioner, and it exhibits some imbalance of tasks. For example, Connected Components with partition strategy Edge2D: Aggregated Metrics by Executor Executor ID Task Time Total Tasks Failed Tasks Succeeded Tasks Input Shuffle Read Shuffle Write Shuffle Spill (Memory) Shuffle Spill (Disk) 1 10 s 10 0 10 234.6 MB 0.0 B 43.2 MB 0.0 B 0.0 B 2 3 s 3 0 3 70.4 MB 0.0 B 13.0 MB 0.0 B 0.0 B 3 6 s 6 0 6 140.7 MB 0.0 B 25.9 MB 0.0 B 0.0 B 4 9 s 8 0 8 187.9 MB 0.0 B 34.6 MB 0.0 B 0.0 B 5 10 s 9 0 9 211.4 MB 0.0 B 38.9 MB 0.0 B 0.0 B For a stage on mapPartitions at VertexRDD.scala:347 343 344 /** Generates an RDD of vertex attributes suitable for shipping to the edge partitions. */ 345 private[graphx] def shipVertexAttributes( 346 shipSrc: Boolean, shipDst: Boolean): RDD[(PartitionID, VertexAttributeBlock[VD])] = { 347 partitionsRDD.mapPartitions(_.flatMap(_.shipVertexAttributes(shipSrc, shipDst))) 348 } 349 This is executed for every iteration in Pregel, so the imbalance is bad for performance. However, when run PageRank with Edge2D, the tasks are even across executors. (all finish 6 tasks) Our configuration is 6 node, 36 partitions. My questions is: What decides the number of tasks for different executors? And how to make it balance? Thanks! Larry --------------010807040009060509060807--