spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <>
Subject Selecting the top 100 records per group by?
Date Sun, 11 Sep 2016 01:04:18 GMT
I'm trying to figure out a way to group by and return the top 100 records
in that group.

Something like:

SELECT TOP(100, user_id) FROM posts GROUP BY user_id;

But I can't really figure out the best way to do this...

There is a FIRST and LAST aggregate function but this only returns one

I could do something like:

SELECT * FROM posts WHERE user_id IN ( /* select top users here */ ) LIMIT

But that limit is applied for ALL the records. Not each individual user.

The only other thing I can think of is to do a manual map reduce and then
have the reducer only return the top 100 each time...

Would LOVE some advice here...


We’re hiring if you know of any awesome Java Devops or Linux Operations

Location: *San Francisco, CA*
… or check out my Google+ profile

View raw message