spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Miner (JIRA)" <>
Subject [jira] [Commented] (SPARK-19428) Ability to select first row of groupby
Date Sat, 04 Feb 2017 18:15:52 GMT


Luke Miner commented on SPARK-19428:

That would be fantastic. Would it be possible to generalize it so that you could get more
than one if needed?


I've also had the use case where people ask me to get the top ten by some criteria for each
group. For example, 10 biggest employers in each county.

> Ability to select first row of groupby
> --------------------------------------
>                 Key: SPARK-19428
>                 URL:
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Luke Miner
>            Priority: Minor
> It would be nice to be able to select the first row from {{GroupedData}}. Pandas has
something like this:
> {{df.groupby('group').first()}}
> It's especially handy if you can order the group as well.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message