I have to take the first 25 IDs ranked by count(*). But the following is not allowed in Hive

select id from T order by count(*) desc limit 25;

Which yields a "NOt yet supported place for UDAF count". The way around it is the following

select id, count(*) as cnt from T group by id order by cnt desc limit 25;

However I need to put this query in a subquery like so

select id, XXX from T t join .... where t.id in (select id, count(*) as cnt from T group by id order by cnt desc limit 25) group by ... ;

which does NOT work because a subquery is allowed to return only one thing. Here XXX are complex constructs like distance calculation and binning. These are time consuming and complex operations.

The only way I see this would work is if I use the following sub-sub query

select id, XXX from T t join ... where t.id in (select sub.id from (select id, count(*) as cnt from T group by id order by cnt desc limit 25)sub) group by ... ;

The reason I don't want to put the limit in the outermost query is because those XXX queries are expensive and I don't want them to be performed on the entire result only to retain the top 25 and throw away the rest of the results. The count(*) operation of the rest of the IDs is not very expensive or time consuming.

Is there some other elegant way of handling this without using a sub-sub-query approach?