spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shahid ashraf <sha...@trialx.com>
Subject Re: repartition vs partitionby
Date Mon, 19 Oct 2015 05:14:49 GMT
yes i am trying to do so. but it will try to repartition whole data.. can't
we split a large partition(data skewed partition) into multiple partitions
(any idea on this.).

On Sun, Oct 18, 2015 at 1:55 AM, Adrian Tanase <atanase@adobe.com> wrote:

> If the dataset allows it you can try to write a custom partitioner to help
> spark distribute the data more uniformly.
>
> Sent from my iPhone
>
> On 17 Oct 2015, at 16:14, shahid ashraf <shahid@trialx.com> wrote:
>
> yes i know about that,its in case to reduce partitions. the point here is
> the data is skewed to few partitions..
>
>
> On Sat, Oct 17, 2015 at 6:27 PM, Raghavendra Pandey <
> raghavendra.pandey@gmail.com> wrote:
>
>> You can use coalesce function, if you want to reduce the number of
>> partitions. This one minimizes the data shuffle.
>>
>> -Raghav
>>
>> On Sat, Oct 17, 2015 at 1:02 PM, shahid qadri <shahidashraff@icloud.com>
>> wrote:
>>
>>> Hi folks
>>>
>>> I need to reparation large set of data around(300G) as i see some
>>> portions have large data(data skew)
>>>
>>> i have pairRDDs [({},{}),({},{}),({},{})]
>>>
>>> what is the best way to solve the the problem
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>
>
> --
> with Regards
> Shahid Ashraf
>
>


-- 
with Regards
Shahid Ashraf

Mime
View raw message