hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Durgapal <durgapalmo...@gmail.com>
Subject How to use joins and averages both in the same hive query
Date Wed, 03 Sep 2014 19:38:29 GMT
I have two tables in hive:

Table1: uid,txid,amt,vendor Table2: uid,txid

Now I need to join the tables on txid which basically confirms a
transaction is finally recorded. There will be some transactions which will
be present only in Table1 and not in Table2.

I need to find out number of avg of transaction matches found per user(uid)
per vendor. Then I need to find the avg of these averages by adding all the
averages and divide them by the number of unique users per vendor.

Let's say I have the data:

Table1:

u1,120,44,vend1
u1,199,33,vend1
u1,100,23,vend1
u1,101,24,vend1
u2,200,34,vend1
u2,202,32,vend2

Table2:

u1,100
u1,101
u2,200
u2,202

Example For vendor vend1:

u1-> Avg transaction find rate = 2(matches found in both Tables,Table1 and
Table2)/4(total occurrence in Table1) =0.5

u2 -> Avg transaction find rate = 1/1 = 1

Avg of avgs = 0.5+1(sum of avgs)/2(total unique users) = 0.75

Required output:

vend1,0.75
vend2,1

I can't seem to find count of both matches and occurrence in just Table1 in
one hive query per user per vendor. I have reached to this query and can't
find how to change it further.

SELECT A.vendor,A.uid,count(*) as totalmatchesperuser FROM Table1 A JOIN
Table2 B ON A.uid = B.uid AND B.txid =A.txid group by vendor,A.uid

Any help would be great.

Mime
View raw message