spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Spark DataFrame GroupBy into List
Date Wed, 14 Oct 2015 17:15:46 GMT
Thats correct.  It is a Hive UDAF.

On Wed, Oct 14, 2015 at 6:45 AM, java8964 <java8964@hotmail.com> wrote:

> My guess is the same as UDAF of (collect_set) in Hive.
>
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF)
>
> Yong
>
> ------------------------------
> From: sliznmailbox@gmail.com
> Date: Wed, 14 Oct 2015 02:45:48 +0000
> Subject: Re: Spark DataFrame GroupBy into List
> To: michael@databricks.com
> CC: user@spark.apache.org
>
>
> Hi Michael,
>
> Can you be more specific on `collect_set`? Is it a built-in function or,
> if it is an UDF, how it is defined?
>
> BR,
> Todd Leo
>
> On Wed, Oct 14, 2015 at 2:12 AM Michael Armbrust <michael@databricks.com>
> wrote:
>
> import org.apache.spark.sql.functions._
>
> df.groupBy("category")
>   .agg(callUDF("collect_set", df("id")).as("id_list"))
>
> On Mon, Oct 12, 2015 at 11:08 PM, SLiZn Liu <sliznmailbox@gmail.com>
> wrote:
>
> Hey Spark users,
>
> I'm trying to group by a dataframe, by appending occurrences into a list
> instead of count.
>
> Let's say we have a dataframe as shown below:
>
> | category | id |
> | -------- |:--:|
> | A        | 1  |
> | A        | 2  |
> | B        | 3  |
> | B        | 4  |
> | C        | 5  |
>
> ideally, after some magic group by (reverse explode?):
>
> | category | id_list  |
> | -------- | -------- |
> | A        | 1,2      |
> | B        | 3,4      |
> | C        | 5        |
>
> any tricks to achieve that? Scala Spark API is preferred. =D
>
> BR,
> Todd Leo
>
>
>
>
>

Mime
View raw message