Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DDF15200B32 for ; Thu, 9 Jun 2016 06:52:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id DC58C160A35; Thu, 9 Jun 2016 04:52:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0680D160A2E for ; Thu, 9 Jun 2016 06:52:14 +0200 (CEST) Received: (qmail 12224 invoked by uid 500); 9 Jun 2016 04:52:13 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 12213 invoked by uid 99); 9 Jun 2016 04:52:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jun 2016 04:52:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id DC1171A0158 for ; Thu, 9 Jun 2016 04:52:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id E0ruWoXF3iJn for ; Thu, 9 Jun 2016 04:52:10 +0000 (UTC) Received: from mail-qk0-f174.google.com (mail-qk0-f174.google.com [209.85.220.174]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 5163A5FACD for ; Thu, 9 Jun 2016 04:52:10 +0000 (UTC) Received: by mail-qk0-f174.google.com with SMTP id s186so15289379qkc.1 for ; Wed, 08 Jun 2016 21:52:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=xLC7YZJChH+wk41HUEeWN2OjUehu0mxgGBLLi6uB4ak=; b=ToaIZnZ4ND44dJKhSuScgKWmQ0ccgs5p0R2Js103bb/qt4xY/JlqQuOYut0Z/YOizd TK1wA456GVEPehjrl9RdpB6pNl0c/dtzVFFd1e8+8WtS9LSF1yDDLe4dvtzXHpdAqNpB GKtB0UGh87dWmLM4vkPmNCpVTnNQVVhS6VfehW2PO/C7OrYlHUZDcXeZduOPqllcmzGF xqflK22TEJ+KO5VDtYC0oaO2N4MhGiZLL74AlEOMlJn1k8hw8o7vLNF1P/0fWM+EGY9W 9NLHdu1gdNAKA4p7M4czMpClXUQ+ue37G3h1YxgFFeS02DlGHDiNHzPQqsubh0CTAb3o I06g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=xLC7YZJChH+wk41HUEeWN2OjUehu0mxgGBLLi6uB4ak=; b=dAVurCz7CMFcJ++CoCi7/Dh74E82s59jkD4gx0m8mYTF05Ct2ltFbMJDJokdmo7YRp vqikHHNbaNjj5lgkHm16zMcJuzEsLOkZT6qPVzmFGh0uSc50qQBw0nX01+gLBwcoc/1U GcpkxMkCoZPsK3gbKlr8hx3oqidqcQrm2dVwerCuLMtrfdfN8HLPb5QU6tE5d7uCH8X1 UZ8rpbaeE7XuMnW9yaDWE0yPW538cz82iuANUUNX53NT7LCI75iepfExmTpbfoMUzRG+ MoNH9bp2FXmbIlf3/aqNTn99YWdu5+07pH7vaO0jYRnQsTl092vgIGiG+uuMR9ENExFf pNNg== X-Gm-Message-State: ALyK8tLpjMGXSoOKdolqy1xhTSY1r+Cp0fCeSiZSgpSueiskzF04b5W61v5h6kek3mO+RMnWIwrSSQZvkTNOJw== X-Received: by 10.55.75.82 with SMTP id y79mr8026871qka.57.1465447922995; Wed, 08 Jun 2016 21:52:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.237.57.101 with HTTP; Wed, 8 Jun 2016 21:51:43 -0700 (PDT) In-Reply-To: <055B6E21-C295-46D4-BE2F-5067E18DFD9B@icloud.com> References: <055B6E21-C295-46D4-BE2F-5067E18DFD9B@icloud.com> From: Alexander Pivovarov Date: Wed, 8 Jun 2016 21:51:43 -0700 Message-ID: Subject: Re: rdd.distinct with Partitioner To: =?UTF-8?B?5rGq5rSL?= Cc: dev Content-Type: multipart/alternative; boundary=001a114a6b3ee31e450534d12e12 archived-at: Thu, 09 Jun 2016 04:52:16 -0000 --001a114a6b3ee31e450534d12e12 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable reduceByKey(randomPartitioner, (a, b) =3D> a + b) also gives incorrect resu= lt Why reduceByKey with Partitioner exists then? On Wed, Jun 8, 2016 at 9:22 PM, =E6=B1=AA=E6=B4=8B = wrote: > Hi Alexander, > > I think it does not guarantee to be right if an arbitrary Partitioner is > passed in. > > I have created a notebook and you can check it out. ( > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e= 239c93eaaa8714f173bcfc/7973071962862063/2110745399505739/58107563000366/lat= est.html > ) > > Best regards, > > Yang > > > =E5=9C=A8 2016=E5=B9=B46=E6=9C=889=E6=97=A5=EF=BC=8C=E4=B8=8A=E5=8D=8811:= 42=EF=BC=8CAlexander Pivovarov =E5=86=99=E9=81=93=EF= =BC=9A > > most of the RDD methods which shuffle data take Partitioner as a paramete= r > > But rdd.distinct does not have such signature > > Should I open a PR for that? > > /** > * Return a new RDD containing the distinct elements in this RDD. > */ > > def distinct(partitioner: Partitioner)(implicit ord: Ordering[T] =3D null= ): RDD[T] =3D withScope { > map(x =3D> (x, null)).reduceByKey(partitioner, (x, y) =3D> x).map(_._1) > } > > > --001a114a6b3ee31e450534d12e12 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
reduceByKey(randomPartitioner, (a, b) =3D> a + b) also = gives incorrect result=C2=A0

Why reduceByKey with Partit= ioner exists then?

On Wed, Jun 8, 2016 at 9:22 PM, =E6=B1=AA=E6=B4=8B <= tiandiwoxin@icloud.com> wrote:
Hi Alexander,

I think it does not guarantee to be right if an arbitrary Partitioner is= passed in.

<= br>
Best regards,

Yang


=E5=9C=A8 2016=E5=B9=B46=E6=9C=889=E6=97=A5=EF=BC=8C=E4=B8=8A=E5=8D=8811:= 42=EF=BC=8CAlexander Pivovarov <apivovarov@gmail.com> =E5=86=99=E9=81=93=EF=BC=9A<= /div>
most of the RDD methods which shuffle data t= ake Partitioner as a parameter

But rdd.distinct does not= have such signature

Should I open a PR for that?<= /div>

/**
* Return a new RDD containing the distinct elements = in this RDD.
*/
<=
span style=3D"color:rgb(0,0,128);font-weight:bold">def distinct(part=
itioner: Partitioner)(i=
mplicit ord: Ordering[T=
] =3D null): RDD=
[T] =3D withScope {
map(x =3D> (x, null)).reduceByKey(partitioner, (x, y) =3D> x= ).map(_._1)
}

--001a114a6b3ee31e450534d12e12--