Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5FD48200D49 for ; Fri, 24 Nov 2017 09:20:03 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5E46F160BF2; Fri, 24 Nov 2017 08:20:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9D855160BEE for ; Fri, 24 Nov 2017 09:20:02 +0100 (CET) Received: (qmail 89778 invoked by uid 500); 24 Nov 2017 08:20:01 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 89659 invoked by uid 99); 24 Nov 2017 08:20:01 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Nov 2017 08:20:01 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id AA14FDFDFB; Fri, 24 Nov 2017 08:20:00 +0000 (UTC) From: viirya To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request #19763: [SPARK-22537][core] Aggregation of map output sta... Content-Type: text/plain Message-Id: <20171124082000.AA14FDFDFB@git1-us-west.apache.org> Date: Fri, 24 Nov 2017 08:20:00 +0000 (UTC) archived-at: Fri, 24 Nov 2017 08:20:03 -0000 Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19763#discussion_r152914613 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -485,4 +485,13 @@ package object config { "array in the sorter.") .intConf .createWithDefault(Integer.MAX_VALUE) + + private[spark] val SHUFFLE_MAP_OUTPUT_PARALLEL_AGGREGATION_THRESHOLD = + ConfigBuilder("spark.shuffle.mapOutput.parallelAggregationThreshold") + .internal() + .doc("Multi-thread is used when the number of mappers * shuffle partitions is greater than " + + "or equal to this threshold.") --- End diff -- After rethinking about this, I think it is better to indicate this threshold also determines the number of threads in parallelism. So it should not be set to zero or negative number. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org