spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-17691) Add aggregate function to collect list with maximum number of elements
Date Tue, 08 Nov 2016 12:41:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647459#comment-15647459
] 

Sean Owen commented on SPARK-17691:
-----------------------------------

I think this is in the realm of what's best accomplished with a UDF rather than another language
function

> Add aggregate function to collect list with maximum number of elements
> ----------------------------------------------------------------------
>
>                 Key: SPARK-17691
>                 URL: https://issues.apache.org/jira/browse/SPARK-17691
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Assaf Mendelson
>            Priority: Minor
>
> One of the aggregate functions we have today is the collect_list function. This is a
useful tool to do a "catch all" aggregation which doesn't really fit anywhere else.
> The problem with collect_list is that it is unbounded. I would like to see a means to
do a collect_list where we limit the maximum number of elements.
> I would see that the input for this would be the maximum number of elements to use and
the method of choosing (pick whatever, pick the top N, pick the bottom B)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message