hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markovitz, Dudu" <dmarkov...@paypal.com>
Subject RE: hive query
Date Fri, 12 Aug 2016 15:14:10 GMT
If the keyword-hour combination is not unique, I would recommend due to performance considerations,
doing the aggregation before the join.
An advanced optimizer might do it by itself (partial aggregation) but I wouldn’t count on
it.

Dudu


select          k.keyword
               ,h.hour
               ,coalesce (t.totalcount,0)  as totalcount
from                            (select distinct keyword from t)                         
                      as k

                 cross join     (select h.pos as hour from (select posexplode(split(space(22),'')))
as h)       as h

                 left join     (select      t.keyword
                                           ,t.hour

                                           ,sum (t.totalcount)  as totalcount

                                from        t

                                group by    t.keyword
                                           ,t.hour
                                )
                                as t

                 on             t.keyword       =
                                k.keyword

                            and t.hour          =
                                h.hour
order by        h.hour
               ,k.keyword
;



From: Joanne Chan [mailto:joannec430@gmail.com]
Sent: Friday, August 12, 2016 5:21 PM
To: user@hive.apache.org
Subject: Re: hive query

The query is assuming Keyword/Hour is unique which I am not sure if that's an assumption per
requirement.

If not, you'd probably want to group by those two columns.

select k.keyword
     , h.hour
     , sum(coalesce(t.totalcount,0))
from (select distinct keyword from t)                                                as k
join (select h.pos as hour from (select posexplode(split(space(22),''))) as h)       as h
left join t
  on t.keyword = k.keyword
 and t.hour    = h.hour
group by h.hour, k.keyword
order by h.hour, k.keyword
;

Nice trick on the `posexplode(split(space`

On Fri, Aug 12, 2016 at 9:28 AM, Markovitz, Dudu <dmarkovitz@paypal.com<mailto:dmarkovitz@paypal.com>>
wrote:
Hi Raj

Here is the code.

Dudu


create table t
(
    Keyword         string
   ,Hour            tinyint
   ,TotalCount      bigint
)
;
 insert into table t values
  ('Iphone' ,11 ,500)
,('Iphone' ,12,1000)
,('Samsung',11, 300)
,('Samsung',12, 600)
,('Nokia'  ,12, 200)
,('Iphone' ,16,1500)
;

select          k.keyword
               ,h.hour
               ,t.totalcount

from                            (select distinct keyword from t)                         
                      as k

                 cross join     (select h.pos as hour from (select posexplode(split(space(22),'')))
as h)       as h

                 left join      t

                 on             t.keyword       =
                                k.keyword

                            and t.hour          =
                                h.hour

order by        h.hour
               ,k.keyword
;


keyword

hour

totalcount

Iphone

0



Nokia

0



Samsung

0



Iphone

1



Nokia

1



Samsung

1



Iphone

2



Nokia

2



Samsung

2



Iphone

3



Nokia

3



Samsung

3



Iphone

4



Nokia

4



Samsung

4



Iphone

5



Nokia

5



Samsung

5



Iphone

6



Nokia

6



Samsung

6



Iphone

7



Nokia

7



Samsung

7



Iphone

8



Nokia

8



Samsung

8



Iphone

9



Nokia

9



Samsung

9



Iphone

10



Nokia

10



Samsung

10



Iphone

11

500

Nokia

11



Samsung

11

300

Iphone

12

1000

Nokia

12

200

Samsung

12

600

Iphone

13



Nokia

13



Samsung

13



Iphone

14



Nokia

14



Samsung

14



Iphone

15



Nokia

15



Samsung

15



Iphone

16

1500

Nokia

16



Samsung

16



Iphone

17



Nokia

17



Samsung

17



Iphone

18



Nokia

18



Samsung

18



Iphone

19



Nokia

19



Samsung

19



Iphone

20



Nokia

20



Samsung

20



Iphone

21



Nokia

21



Samsung

21



Iphone

22



Nokia

22



Samsung

22



Iphone

23



Nokia

23



Samsung

23





From: raj hive [mailto:raj.hiveql@gmail.com<mailto:raj.hiveql@gmail.com>]
Sent: Friday, August 12, 2016 3:28 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: hive query

Dear Friends,
I have a hive table with column name Keyword,Hour,TotalTweets.
for example, I have the date for three keyword as below.
Keyword    Hour    TotalCount
 iphone       11           500
 iphone       12           1000
 Samsung   11           300
 Samsung   12           600
 Nokia         12           200
 Iphone       16          1500
 I want a query to get output for 24 hours like below. I need to show the zero count if i
don't have the data. Can anyone help me the hive query.

Keyword

hour

TotalCount

iphone

0

0

samsung

0

0

nokia

0

0

iphone

1

0

samsung

1

0

nokia

1

0

iphone

2

0

samsung

2

0

nokia

2

0

iphone

3

0

samsung

3

0

nokia

3

0

iphone

4

0

samsung

4

0

nokia

4

0

iphone

5

0

samsung

5

0

nokia

5

0

iphone

6

0

samsung

6

0

nokia

6

0

iphone

7

0

samsung

7

0

nokia

7

0

iphone

8

0

samsung

8

0

nokia

8

0

iphone

9

0

samsung

9

0

nokia

9

0

iphone

10

0

samsung

10

0

nokia

10

0

iphone

11

500

samsung

11

300

nokia

11

0

iphone

12

1000

samsung

12

600

nokia

12

200

iphone

13

0

samsung

13

0

nokia

13

0

iphone

14

0

samsung

14

0

nokia

14

0

iphone

15

0

samsung

15

0

nokia

15

0

iphone

16

0

samsung

16

0

nokia

16

1500

iphone

17

0

samsung

17

0

nokia

17

0

iphone

18

0

samsung

18

0

nokia

18

0

iphone

19

0

samsung

19

0

nokia

19

0

iphone

20

0

samsung

20

0

nokia

20

0

iphone

21

0

samsung

21

0

nokia

21

0

iphone

22

0

samsung

22

0

nokia

22

0

iphone

23

0

samsung

23

0

nokia

23

0




Mime
View raw message