hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From comptech geeky <comptechge...@gmail.com>
Subject Re: Something wrong with my query to get TOP 3?
Date Thu, 19 Jul 2012 20:41:42 GMT
I wrote this query after modifying it-

*SELECT buyer_id, item_id, rank(buyer_id), created_time,
UNIX_TIMESTAMP(created_time)*
*FROM (*
*    SELECT buyer_id, item_id, created_time*
*    FROM testingtable1*
* where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as int))) =
'2012-07-09'*
*    DISTRIBUTE BY buyer_id,item_id*
*    SORT BY buyer_id, created_time desc*
*) a*
*WHERE rank(buyer_id) < 3;*

And the output I got is which is sligtly wrong as it is missing two rows-

*1345653    330760137950       2012-07-09 21:43:29*
*1345653    330760137950       2012-07-09 21:42:29*
*1345653    330760137950       2012-07-09 21:41:29*

These two rows are missing-

1345653    110909316904       2012-07-09 21:29:06
1345653    221065796761       2012-07-09 19:31:48

So full output should be like this-

*1345653    330760137950       2012-07-09 21:43:29*
*1345653    330760137950       2012-07-09 21:42:29*
*1345653    330760137950       2012-07-09 21:41:29*
1345653    110909316904       2012-07-09 21:29:06
1345653    221065796761       2012-07-09 19:31:48


On Thu, Jul 19, 2012 at 1:29 PM, comptech geeky <comptechgeeky@gmail.com>wrote:

> Can you show me the exact query that I need to do for this particular
> problem consideing my scenario? It will be of great help to me. As I am new
> to HiveQL.
>
> I need TOP 3 for those if BID and PID gets matched but with different
> timestamp.
>
>
>
> On Thu, Jul 19, 2012 at 1:15 PM, Philip Tromans <
> philip.j.tromans@gmail.com> wrote:
>
>> Your rank() is being evaluated map side. Put your distribute by and sort
>> by in an inner query, and then evaluate your rank() in an outer query.
>>
>> Phil.
>> On Jul 19, 2012 9:00 PM, "comptech geeky" <comptechgeeky@gmail.com>
>> wrote:
>>
>>> This is the below data in my Table1
>>>
>>>
>>> BID       PID                       TIME
>>> --------------+-------------------------+--------------------------------
>>> 1345653   330760137950       2012-07-09 21:42:29
>>> 1345653   330760137950       2012-07-09 21:43:29
>>> 1345653   330760137950       2012-07-09 21:40:29
>>> 1345653   330760137950       2012-07-09 21:41:29
>>> 1345653   110909316904       2012-07-09 21:29:06
>>> 1345653   221065796761       2012-07-09 19:31:48
>>>
>>> So If I need to clarify the above scenario- I have data in above table
>>> like this-
>>> For USER *`1345653` *I have this PID `*330760137950` *four times but
>>> with different timestamps in red color. So I need the output something like
>>> this-
>>>
>>> Output that I need:-
>>>
>>> *1345653    330760137950       2012-07-09 21:43:29 *
>>> *1345653    330760137950       2012-07-09 21:42:29 *
>>> *1345653    330760137950       2012-07-09 21:41:29*
>>> 1345653    110909316904       2012-07-09 21:29:06
>>> 1345653    221065796761       2012-07-09 19:31:48
>>>
>>> So Basically If BID and PID are same but with different timestamps, then
>>> I need TOP 3 sorted with TIME in descending order
>>>
>>> And for this I created rank UDF (User Defined Function). And I wrote the
>>> below query but its not working for me. Can anyone help me on this?
>>> *
>>> *
>>> *
>>> *
>>> *SELECT buyer_id, item_id, created_time*
>>> *    FROM table1*
>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as
>>> int))) = '2012-07-09'*
>>> *    DISTRIBUTE BY buyer_id*
>>> *    SORT BY buyer_id, created_time desc*
>>>
>>>
>

Mime
View raw message