spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deng Ching-Mallete <odeach...@gmail.com>
Subject Re: hiveContext sql number of tasks
Date Wed, 07 Oct 2015 14:37:26 GMT
Hi,

You can do coalesce(N), where N is the number of partitions you want it
reduced to, after loading the data into an RDD.

HTH,
Deng

On Wed, Oct 7, 2015 at 6:34 PM, patcharee <Patcharee.Thongtra@uni.no> wrote:

> Hi,
>
> I do a sql query on about 10,000 partitioned orc files. Because of the
> partition schema the files cannot be merged any longer (to reduce the total
> number).
>
> From this command hiveContext.sql(sqlText), the 10K tasks were created to
> handle each file. Is it possible to use less tasks? How to force the spark
> sql to use less tasks?
>
> BR,
> Patcharee
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message