hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish <maa...@gmail.com>
Subject Re: hive query to calculate percentage
Date Thu, 27 Feb 2014 00:10:30 GMT
Yong,
Thanks for your response..

I am getting an error :

FAILED: SemanticException [Error 10002]: Line 11:28 Invalid column 
reference 'timestamp_dt'


Sorry I didn't mention earlier, the  table is partitioned on "timestamp_dt "


On 2/25/14, 5:57 PM, java8964 wrote:
> one query won't work, as totalcount is not in "group by".
>
> You have 2 options:
>
> 1) use the sub query
>
> select a.timestamp_dt, a.totalcount/b.total_sum
> from daily_count_per_kg_domain a join
> (
> select timestamp_dt, sum(totalcount) as total_sum
> from
> daily_count_per_kg_domain
> group by timestamp_dt
> ) b on (a.timestamp_dt = b.timestamp_dt)
>
> 2) If you are using hive 11 or above, using windows functions.
>
> Yong
>
> ------------------------------------------------------------------------
> Date: Tue, 25 Feb 2014 18:27:34 -0600
> Subject: Re: hive query to calculate percentage
> From: kkrishnan88@gmail.com
> To: user@hive.apache.org
>
> Modfiy the query to :
> select totalcount / sum(totalcount) from daily_count_per_kg_domain 
> where timestamp_dt = '20140219' group by timestamp_dt;
>
> if you dont specify the where clause, you will get result for all 
> partitions.
>
>
> On Tue, Feb 25, 2014 at 3:14 PM, Manish <maaand@gmail.com 
> <mailto:maaand@gmail.com>> wrote:
>
>     I have a partitioned table  on timestamp_dt:
>
>     > desc daily_count_per_kg_domain;
>     OK
>     ddnamesyskg     string
>     totalcount      int
>     timestamp_dt    string
>
>     hive> select * from daily_count_per_kg_domain;
>     OK
>     sys_kg_band     224     20140219
>     sys_kg_event    3435    20140219
>     sys_kg_movies   44987   20140219
>     sys_kg_oly      4172    20140219
>     sys_kg_sp_countr        5499    20140219
>     sys_kg_sports   3954    20140219
>     sys_kg_tv       21387   20140219
>     sys_kg_venue    152     20140219
>     sys_kgathlete   9000    20140219
>     sys_kgpeople    300064  20140219
>
>
>     Looking to compute percentages for each row, (per day):
>
>     100* totalcount / sum(totalcount)
>
>
>     Intuitively i tried :
>     > select totalcount / sum(totalcount) from
>     daily_count_per_kg_domain where timestamp_dt = '20140219' ;
>
>     FAILED: SemanticException [Error 10025]: Line 1:7 Expression not
>     in GROUP BY key 'totalcount'
>
>     I am not sure what group by on totalcount means..
>
>     Any ides ?
>
>
>


Mime
View raw message