Mailing-List: contact dev-help@spark.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@spark.apache.org
Received-SPF: pass (nike.apache.org: domain of dbtsai@stanford.edu designates
 171.67.219.81 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAF4jm1C=cqLCezyeiR=p528BWGZjNtvjRMYYmASXn-+CveA-5A@mail.gmail.com>
References: 
 <CAF4jm1C=cqLCezyeiR=p528BWGZjNtvjRMYYmASXn-+CveA-5A@mail.gmail.com>
Date: Sun, 4 May 2014 01:40:32 -0700
Message-ID: 
 <CAEYYnxYpGibGoy3RZ0XfQZq4yLURMBT32YtOoQZJXtx1cb587w@mail.gmail.com>
Subject: Re: reduce, transform, combine
From: DB Tsai <dbtsai@stanford.edu>
To: dev@spark.apache.org
Content-Type: multipart/alternative; boundary=047d7bdc912ec95ef204f88ef762

--047d7bdc912ec95ef204f88ef762
Content-Type: text/plain; charset=UTF-8

You could easily achieve this by mapPartition. However, it seems that it
can not be done by using aggregate type of operation. I can see that it's a
general useful operation. For now, you could use mapPartition.


Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Sun, May 4, 2014 at 1:12 AM, Manish Amde <manish9ue@gmail.com> wrote:

> I am currently using the RDD aggregate operation to reduce (fold) per
> partition and then combine using the RDD aggregate operation.
> def aggregate[U: ClassTag](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U)
> => U): U
>
> I need to perform a transform operation after the seqOp and before the
> combOp. The signature would look like
> def foldTransformCombine[U: ClassTag](zeroReduceValue: V, zeroCombineValue:
> U)(seqOp: (V, T) => V, transformOp: (V) => U, combOp: (U, U) => U): U
>
> This is especially useful in the scenario where the transformOp is
> expensive and should be performed once per partition before combining. Is
> there a way to accomplish this with existing RDD operations? If yes, great
> but if not, should we consider adding such a general transformation to the
> list of RDD operations?
>
> -Manish
>

--047d7bdc912ec95ef204f88ef762--