spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elliott cordo <elliottco...@gmail.com>
Subject Re: aliasing aggregate columns?
Date Fri, 17 Apr 2015 11:44:45 GMT
Ps.. forgot to mention this syntax works... but then you loose your group
by fields (which is honestly pretty weird, i'm not sure if this is as
designed or a bug?)

>>> t2 = reviews.groupBy("stars").agg(count("stars").alias("count"))

>>> t2

*DataFrame[count: bigint]*

On Thu, Apr 16, 2015 at 9:32 PM, elliott cordo <elliottcordo@gmail.com>
wrote:

> FYI.. the problem is that column names spark generates are not able to be
> referenced within SQL or dataframe operations (ie. "SUM(cool_cnt#725)")..
> any idea how to alias these final aggregate columns..
>
> the syntax below doesn't make sense, but this is what i'd ideally want to
> do:
> .agg({"cool_cnt":"sum".alias("cool_cnt"),"*":"count".alias("cnt")})
>
> On Wed, Apr 15, 2015 at 7:23 PM, elliott cordo <elliottcordo@gmail.com>
> wrote:
>
>> Hi Guys -
>>
>> Having trouble figuring out the semantics for using the alias function on
>> the final sum and count aggregations?
>>
>> >>> cool_summary = reviews.select(reviews.user_id,
>> cool_cnt("votes.cool").alias("cool_cnt")).groupBy("user_id").agg({"cool_cnt":"sum","*":"count"})
>>
>> >>> cool_summary
>>
>> DataFrame[user_id: string, SUM(cool_cnt#725): double, COUNT(1): bigint]
>>
>
>

Mime
View raw message