spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghavendra Pandey <raghavendra.pan...@gmail.com>
Subject Re: repartition vs partitionby
Date Sat, 17 Oct 2015 12:57:37 GMT
You can use coalesce function, if you want to reduce the number of
partitions. This one minimizes the data shuffle.

-Raghav

On Sat, Oct 17, 2015 at 1:02 PM, shahid qadri <shahidashraff@icloud.com>
wrote:

> Hi folks
>
> I need to reparation large set of data around(300G) as i see some portions
> have large data(data skew)
>
> i have pairRDDs [({},{}),({},{}),({},{})]
>
> what is the best way to solve the the problem
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message