spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Aggregator support in DataFrame
Date Tue, 12 Apr 2016 17:50:40 GMT
Did you see these?

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/scala/typed.scala#L70

On Tue, Apr 12, 2016 at 9:46 AM, Koert Kuipers <koert@tresata.com> wrote:

> i dont really see how Aggregator can be useful for DataFrame unless you
> can specify what columns it works on. Having to code Aggregators to always
> use Row and then extract the values yourself breaks the abstraction and
> makes it not much better than UserDefinedAggregateFunction (well... maybe
> still better because i have encoders so i can use kryo).
>
> On Mon, Apr 11, 2016 at 10:53 PM, Koert Kuipers <koert@tresata.com> wrote:
>
>> saw that, dont think it solves it. i basically want to add some children
>> to the expression i guess, to indicate what i am operating on? not sure if
>> even makes sense
>>
>> On Mon, Apr 11, 2016 at 8:04 PM, Michael Armbrust <michael@databricks.com
>> > wrote:
>>
>>> I'll note this interface has changed recently:
>>> https://github.com/apache/spark/commit/520dde48d0d52dbbbbe1710a3275fdd5355dd69d
>>>
>>> I'm not sure that solves your problem though...
>>>
>>> On Mon, Apr 11, 2016 at 4:45 PM, Koert Kuipers <koert@tresata.com>
>>> wrote:
>>>
>>>> i like the Aggregator a lot
>>>> (org.apache.spark.sql.expressions.Aggregator), but i find the way to use
it
>>>> somewhat confusing. I am supposed to simply call aggregator.toColumn, but
>>>> that doesn't allow me to specify which fields it operates on in a DataFrame.
>>>>
>>>> i would basically like to do something like
>>>> dataFrame
>>>>   .groupBy("k")
>>>>   .agg(
>>>>     myAggregator.on("v1", "v2").toColumn,
>>>>     myOtherAggregator.on("v3", "v4").toColumn
>>>>   )
>>>>
>>>
>>>
>>
>

Mime
View raw message