Thats correct.  It is a Hive UDAF.

On Wed, Oct 14, 2015 at 6:45 AM, java8964 <java8964@hotmail.com> wrote:
My guess is the same as UDAF of (collect_set) in Hive.


Yong


From: sliznmailbox@gmail.com
Date: Wed, 14 Oct 2015 02:45:48 +0000
Subject: Re: Spark DataFrame GroupBy into List
To: michael@databricks.com
CC: user@spark.apache.org


Hi Michael, 

Can you be more specific on `collect_set`? Is it a built-in function or, if it is an UDF, how it is defined?

BR,
Todd Leo

On Wed, Oct 14, 2015 at 2:12 AM Michael Armbrust <michael@databricks.com> wrote:
import org.apache.spark.sql.functions._

df.groupBy("category")
  .agg(callUDF("collect_set", df("id")).as("id_list"))

On Mon, Oct 12, 2015 at 11:08 PM, SLiZn Liu <sliznmailbox@gmail.com> wrote:
Hey Spark users,

I'm trying to group by a dataframe, by appending occurrences into a list instead of count. 

Let's say we have a dataframe as shown below:
| category | id |
| -------- |:--:|
| A        | 1  |
| A        | 2  |
| B        | 3  |
| B        | 4  |
| C        | 5  |
ideally, after some magic group by (reverse explode?):
| category | id_list  |
| -------- | -------- |
| A        | 1,2      |
| B        | 3,4      |
| C        | 5        |
any tricks to achieve that? Scala Spark API is preferred. =D

BR,
Todd Leo