Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A2CB818570 for ; Mon, 19 Oct 2015 05:15:08 +0000 (UTC) Received: (qmail 37941 invoked by uid 500); 19 Oct 2015 05:15:04 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 37835 invoked by uid 500); 19 Oct 2015 05:15:04 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 37825 invoked by uid 99); 19 Oct 2015 05:15:04 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Oct 2015 05:15:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 30AAB180E3C for ; Mon, 19 Oct 2015 05:15:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.001 X-Spam-Level: *** X-Spam-Status: No, score=3.001 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id DFaVM8pzW2wx for ; Mon, 19 Oct 2015 05:14:50 +0000 (UTC) Received: from mail-wi0-f171.google.com (mail-wi0-f171.google.com [209.85.212.171]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 2AFFB429AA for ; Mon, 19 Oct 2015 05:14:50 +0000 (UTC) Received: by wicll6 with SMTP id ll6so80972438wic.1 for ; Sun, 18 Oct 2015 22:14:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=6cEVQgRy7gTqr37ZBbYzONwL7dNe5P/IAyhmf9e4g1o=; b=IIDjuh1j40F4suVHOrOnevklEG4xnx5CvssXfPo+tojZ/yNIZjb4gZU1id0/+qbus6 KaRf0xOa4mFKYuBfSLDT7WzKBqxMuvNBaNZmeLZ6GHlpJkdyWkIl44ffRUCjggy8OFM/ j4vf3Ps+aU0/o0RQZfXotyMDXEumhb364cz9Q9ErEHx1ufi+AjkR65+enCAjZmXx5iQL Krb+i1+DEemqhBF8iH3PodzBwPIw+7mBvdkIwASq3DmRAokk0Ot1/Vgw4+8QRKd4gtQn Qt/qAuXLO/vchc1WxnSdriv743arsGsHkVIHHVg5Nqk7j7xPmj9HbysrH393k2O0k229 oRwA== X-Gm-Message-State: ALoCoQlt13vhHhWuz8UPVdn17+v7tNmUW9ci3iBPsSEL2JHfwUD0o0PZJGlkJxGnXeV1mDD7baXh MIME-Version: 1.0 X-Received: by 10.180.108.110 with SMTP id hj14mr19242547wib.39.1445231689338; Sun, 18 Oct 2015 22:14:49 -0700 (PDT) Received: by 10.28.127.23 with HTTP; Sun, 18 Oct 2015 22:14:49 -0700 (PDT) X-Originating-IP: [117.222.193.41] In-Reply-To: References: <28C3AAC5-3F99-41CF-AAF3-F59A6D916E57@icloud.com> Date: Mon, 19 Oct 2015 10:44:49 +0530 Message-ID: Subject: Re: repartition vs partitionby From: shahid ashraf To: Adrian Tanase Cc: Raghavendra Pandey , shahid qadri , User Content-Type: multipart/alternative; boundary=089e0111ceb6763b1605226e394d --089e0111ceb6763b1605226e394d Content-Type: text/plain; charset=UTF-8 yes i am trying to do so. but it will try to repartition whole data.. can't we split a large partition(data skewed partition) into multiple partitions (any idea on this.). On Sun, Oct 18, 2015 at 1:55 AM, Adrian Tanase wrote: > If the dataset allows it you can try to write a custom partitioner to help > spark distribute the data more uniformly. > > Sent from my iPhone > > On 17 Oct 2015, at 16:14, shahid ashraf wrote: > > yes i know about that,its in case to reduce partitions. the point here is > the data is skewed to few partitions.. > > > On Sat, Oct 17, 2015 at 6:27 PM, Raghavendra Pandey < > raghavendra.pandey@gmail.com> wrote: > >> You can use coalesce function, if you want to reduce the number of >> partitions. This one minimizes the data shuffle. >> >> -Raghav >> >> On Sat, Oct 17, 2015 at 1:02 PM, shahid qadri >> wrote: >> >>> Hi folks >>> >>> I need to reparation large set of data around(300G) as i see some >>> portions have large data(data skew) >>> >>> i have pairRDDs [({},{}),({},{}),({},{})] >>> >>> what is the best way to solve the the problem >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org >>> For additional commands, e-mail: user-help@spark.apache.org >>> >>> >> > > > -- > with Regards > Shahid Ashraf > > -- with Regards Shahid Ashraf --089e0111ceb6763b1605226e394d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
yes i am trying to do so. but it will try to repartition w= hole data.. can't we split a large partition(data skewed partition) int= o multiple partitions (any idea on this.).
=
On Sun, Oct 18, 2015 at 1:55 AM, Adrian Tana= se <atanase@adobe.com> wrote:
If the dataset allows it you can try to write a custom partitioner to = help spark distribute the data more uniformly.

Sent from my iPhone

On 17 Oct 2015, at 16:14, shahid ashraf <shahid@trialx.com> wrote:

yes i know about that,its in case to reduce partitions. th= e point here is the data is skewed to few partitions..


On Sat, Oct 17, 2015 at 6:27 PM, Raghavendra Pan= dey <ragha= vendra.pandey@gmail.com> wrote:
You can use coalesce function, if you want to reduce the n= umber of partitions. This one minimizes the data shuffle.=C2=A0

-Raghav

On Sat, Oct 17, 2015 at 1:02 PM, shahid qadri <shahidash= raff@icloud.com> wrote:
Hi folks

I need to reparation large set of data around(300G) as i see some portions = have large data(data skew)

i have pairRDDs [({},{}),({},{}),({},{})]

what is the best way to solve the the problem
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org





--
with Regards
Shahid Ashraf



--
with Regards
Shahid Ashraf
<= /div>
--089e0111ceb6763b1605226e394d--