spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "write2sivakumar@gmail" <write2sivaku...@gmail.com>
Subject Re: aggregateByKey on PairRDD
Date Thu, 31 Mar 2016 02:58:21 GMT

    
Hi,
We can use CombineByKey to achieve this.
val finalRDD = tempRDD.combineByKey((x: (Any, Any)) => (x),(acc: (Any, Any), x) => (acc,
x),(acc1: (Any, Any), acc2: (Any, Any)) => (acc1, acc2))
finalRDD.collect.foreach(println)
(amazon,((book1, tech),(book2,tech)))(barns&noble, (book,tech))(eBay, (book1,tech))
Thanks,Sivakumar

-------- Original message --------
From: Daniel Haviv <daniel.haviv@veracity-group.com> 
Date: 30/03/2016  18:58  (GMT+08:00) 
To: Akhil Das <akhil@sigmoidanalytics.com> 
Cc: Suniti Singh <suniti.singh@gmail.com>, user@spark.apache.org, dev <dev@spark.apache.org>

Subject: Re: aggregateByKey on PairRDD 

Hi,shouldn't groupByKey be avoided (https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html)
?

Thank you,.Daniel
On Wed, Mar 30, 2016 at 9:01 AM, Akhil Das <akhil@sigmoidanalytics.com> wrote:
Isn't it what tempRDD.groupByKey does? 
ThanksBest Regards

On Wed, Mar 30, 2016 at 7:36 AM, Suniti Singh <suniti.singh@gmail.com> wrote:
Hi All,
I have an RDD having the data in  the following form :








tempRDD: RDD[(String, (String, String))](brand , (product, key))("amazon",("book1","tech"))("eBay",("book1","tech"))
("barns&noble",("book","tech"))
("amazon",("book2","tech"))
I would like to group the data by Brand and would like to get the result set in the following
format :resultSetRDD : RDD[(String, List[(String), (String)]i tried using the aggregateByKey
but kind  of not getting how to achieve this. OR is there any other way to achieve this?







val resultSetRDD  = tempRDD.aggregateByKey("")({case (aggr , value) => aggr + String.valueOf(value)
+ ","}, (aggr1, aggr2) => aggr1 + aggr2)resultSetRDD = (amazon,("book1","tech"),("book2","tech"))Thanks,Suniti




Mime
View raw message