hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devopam Mittra <devo...@gmail.com>
Subject Re: count(*) not allowed in order by
Date Mon, 07 Mar 2016 17:47:31 GMT
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

This should help you , try rank/ dense rank as appropriate and mold it to
best use for yourself
Regards
Dev
On Mar 7, 2016 10:35 PM, "Awhan Patnaik" <awhan@spotzot.com> wrote:

> I have to take the first 25 IDs ranked by count(*). But the following is
> not allowed in Hive
>
> select id from T order by count(*) desc limit 25;
>
> Which yields a "NOt yet supported place for UDAF count". The way around it
> is the following
>
> select id, count(*) as cnt from T group by id order by cnt desc limit 25;
>
> However I need to put this query in a subquery like so
>
> select id, XXX from T t join .... where t.id in (select id, count(*) as
> cnt from T group by id order by cnt desc limit 25) group by ... ;
>
> which does NOT work because a subquery is allowed to return only one
> thing. Here XXX are complex constructs like distance calculation and
> binning. These are time consuming and complex operations.
>
> The only way I see this would work is if I use the following sub-sub query
>
> select id, XXX from T t join ... where t.id in (select sub.id from
> (select id, count(*) as cnt from T group by id order by cnt desc limit
> 25)sub) group by ... ;
>
> The reason I don't want to put the limit in the outermost query is because
> those XXX queries are expensive and I don't want them to be performed on
> the entire result only to retain the top 25 and throw away the rest of the
> results. The count(*) operation of the rest of the IDs is not very
> expensive or time consuming.
>
> Is there some other elegant way of handling this without using a
> sub-sub-query approach?
>

Mime
View raw message