hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cam Bazz <>
Subject data question
Date Mon, 31 Jan 2011 21:13:35 GMT

After doing some aggregate counting, I now have data in a table like this:

id     count    date_hour (this is a partition name)
1	3	2011310115
1	1	2011310116
2	1	2011310117
2	1	2011310118

and I need to turn this into:

1 [2011310115,2011310115] [3,1] 4
2 [2011310117,2011310118] [1,1] 2

explanation: first field is id, second field is a list of date_hour's
- a partition from previous table, third field lis a list of counts,
and the fourth field is the sum of counts.

given that I used a date_hour key for partition, how can I do this, or
accomplish similar?

currently I process data hourly, but i might need to another aggregate
to find daily results, i.e, iterate over multiple partitions maybe for
a months data, and generate the statistics daily instead of hourly.

Any ideas / recommendation greatly appreciated.

Best Regards,


View raw message