spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <java8...@hotmail.com>
Subject RE: Spark DataFrame GroupBy into List
Date Wed, 14 Oct 2015 13:45:44 GMT
My guess is the same as UDAF of (collect_set) in Hive.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF)
Yong

From: sliznmailbox@gmail.com
Date: Wed, 14 Oct 2015 02:45:48 +0000
Subject: Re: Spark DataFrame GroupBy into List
To: michael@databricks.com
CC: user@spark.apache.org

Hi Michael, 
Can you be more specific on `collect_set`? Is it a built-in function or, if it is an UDF,
how it is defined?
BR,Todd Leo
On Wed, Oct 14, 2015 at 2:12 AM Michael Armbrust <michael@databricks.com> wrote:
import org.apache.spark.sql.functions._
df.groupBy("category")  .agg(callUDF("collect_set", df("id")).as("id_list"))
On Mon, Oct 12, 2015 at 11:08 PM, SLiZn Liu <sliznmailbox@gmail.com> wrote:
Hey Spark users,
I'm trying to group by a dataframe, by appending occurrences into a list instead of count.

Let's say we have a dataframe as shown below:| category | id |
| -------- |:--:|
| A        | 1  |
| A        | 2  |
| B        | 3  |
| B        | 4  |
| C        | 5  |
ideally, after some magic group by (reverse explode?):| category | id_list  |
| -------- | -------- |
| A        | 1,2      |
| B        | 3,4      |
| C        | 5        |
any tricks to achieve that? Scala Spark API is preferred. =D
BR,Todd Leo 





 		 	   		  
Mime
View raw message