spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chirag Aggarwal <Chirag.Aggar...@guavus.com>
Subject Value of SHUFFLE_PARTITIONS
Date Mon, 01 Sep 2014 12:02:18 GMT
Hi,

Currently the number of shuffle partitions is config driven parameter (SHUFFLE_PARTITIONS)
. This means that anyone who is running a spark-sql query should first of
all analyze that what value of SHUFFLE_PARTITIONS would give the best performance for the
query.

Shouldn't there be a logic in SparkSql which should be able to figure out the best value and
also provide a mechanism to give preference to user specified value.
This I believe can be worked out on the basis of number of partitions in the original data.

I ran some queries and with default value (200) of shuffle-partitioning, and when I changed
this value to 5, the time taken by the query reduced by nearly 35%.

Thanks,
Chirag

Mime
View raw message