hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Tatarinov <i...@decide.com>
Subject Re: Something wrong with my query to get TOP 3?
Date Thu, 19 Jul 2012 22:57:14 GMT
Sorry, just pid needs to be dropped from both DISTRIBUTE and SORT clauses.
Your very first query was correct except for the nested subquery part. (You
don't need a double-nested subquery.)

On Thu, Jul 19, 2012 at 3:48 PM, comptech geeky <comptechgeeky@gmail.com>wrote:

> Hi Igor,
>
> I am not sure what I have to remove from Distribute By as in distribute by
> we have bid, pid and you said remove bid and time from distribute by and it
> doesn't have time
>
> *SELECT bid, pid, rank FROM *
>       *(SELECT bid, pid, rank(bid) rank, time, UNIX_TIMESTAMP(time)  FROM
> *
>   *
>            ( SELECT bid, pid, time FROM table1*
> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) =
> '2012-07-09' *
> *            DISTRIBUTE BY bid,pid*
> *            SORT BY bid,pid, time desc) A
> *
> *
>       ) B
> *
> *WHERE rank < 3;*
>
>
> And also I tried running the above query as it is. I am not getting
> expected output instead of that I am getting output like this which is
> wrong If you compare my expected output with the below output-
>
> *1345653 110909316904    0*
> *1345653 221065796761    1*
> *1345653 330760137950    2*
>
>
> On Thu, Jul 19, 2012 at 3:43 PM, Igor Tatarinov <igor@decide.com> wrote:
>
>> Remove pid,time from DISTRIBUTE BY.
>>
>> On Thu, Jul 19, 2012 at 1:45 PM, comptech geeky <comptechgeeky@gmail.com>wrote:
>>
>>> Modified Query that I wrote and its not working as expected output is.
>>>
>>> *
>>> *
>>> *SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)*
>>> *FROM (*
>>> *    SELECT bid, pid, time*
>>> *    FROM table1*
>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) =
>>> '2012-07-09'*
>>> *    DISTRIBUTE BY bid,pid,time*
>>> *    SORT BY bid, time desc*
>>> *) a*
>>> *WHERE rank(bid) < 3;*
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 19, 2012 at 1:41 PM, comptech geeky <comptechgeeky@gmail.com
>>> > wrote:
>>>
>>>> I wrote this query after modifying it-
>>>>
>>>> *SELECT buyer_id, item_id, rank(buyer_id), created_time,
>>>> UNIX_TIMESTAMP(created_time)*
>>>> *FROM (*
>>>> *    SELECT buyer_id, item_id, created_time*
>>>> *    FROM testingtable1*
>>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as
>>>> int))) = '2012-07-09'*
>>>> *    DISTRIBUTE BY buyer_id,item_id*
>>>> *    SORT BY buyer_id, created_time desc*
>>>> *) a*
>>>> *WHERE rank(buyer_id) < 3;*
>>>>
>>>> And the output I got is which is sligtly wrong as it is missing two
>>>> rows-
>>>>
>>>> *1345653    330760137950       2012-07-09 21:43:29*
>>>> *1345653    330760137950       2012-07-09 21:42:29*
>>>> *1345653    330760137950       2012-07-09 21:41:29*
>>>>
>>>> These two rows are missing-
>>>>
>>>> 1345653    110909316904       2012-07-09 21:29:06
>>>> 1345653    221065796761       2012-07-09 19:31:48
>>>>
>>>>  So full output should be like this-
>>>>
>>>> *1345653    330760137950       2012-07-09 21:43:29*
>>>> *1345653    330760137950       2012-07-09 21:42:29*
>>>> *1345653    330760137950       2012-07-09 21:41:29*
>>>> 1345653    110909316904       2012-07-09 21:29:06
>>>> 1345653    221065796761       2012-07-09 19:31:48
>>>>
>>>>
>>>> On Thu, Jul 19, 2012 at 1:29 PM, comptech geeky <
>>>> comptechgeeky@gmail.com> wrote:
>>>>
>>>>> Can you show me the exact query that I need to do for this particular
>>>>> problem consideing my scenario? It will be of great help to me. As I
am new
>>>>> to HiveQL.
>>>>>
>>>>> I need TOP 3 for those if BID and PID gets matched but with different
>>>>> timestamp.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 19, 2012 at 1:15 PM, Philip Tromans <
>>>>> philip.j.tromans@gmail.com> wrote:
>>>>>
>>>>>> Your rank() is being evaluated map side. Put your distribute by and
>>>>>> sort by in an inner query, and then evaluate your rank() in an outer
query.
>>>>>>
>>>>>> Phil.
>>>>>> On Jul 19, 2012 9:00 PM, "comptech geeky" <comptechgeeky@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> This is the below data in my Table1
>>>>>>>
>>>>>>>
>>>>>>> BID       PID                       TIME
>>>>>>>
>>>>>>> --------------+-------------------------+--------------------------------
>>>>>>> 1345653   330760137950       2012-07-09 21:42:29
>>>>>>> 1345653   330760137950       2012-07-09 21:43:29
>>>>>>> 1345653   330760137950       2012-07-09 21:40:29
>>>>>>> 1345653   330760137950       2012-07-09 21:41:29
>>>>>>> 1345653   110909316904       2012-07-09 21:29:06
>>>>>>> 1345653   221065796761       2012-07-09 19:31:48
>>>>>>>
>>>>>>> So If I need to clarify the above scenario- I have data in above
>>>>>>> table like this-
>>>>>>> For USER *`1345653` *I have this PID `*330760137950` *four times
>>>>>>> but with different timestamps in red color. So I need the output
something
>>>>>>> like this-
>>>>>>>
>>>>>>> Output that I need:-
>>>>>>>
>>>>>>> *1345653    330760137950       2012-07-09 21:43:29 *
>>>>>>> *1345653    330760137950       2012-07-09 21:42:29 *
>>>>>>> *1345653    330760137950       2012-07-09 21:41:29*
>>>>>>> 1345653    110909316904       2012-07-09 21:29:06
>>>>>>> 1345653    221065796761       2012-07-09 19:31:48
>>>>>>>
>>>>>>> So Basically If BID and PID are same but with different timestamps,
>>>>>>> then I need TOP 3 sorted with TIME in descending order
>>>>>>>
>>>>>>> And for this I created rank UDF (User Defined Function). And
I wrote
>>>>>>> the below query but its not working for me. Can anyone help me
on this?
>>>>>>> *
>>>>>>> *
>>>>>>> *
>>>>>>> *
>>>>>>> *SELECT buyer_id, item_id, created_time*
>>>>>>> *    FROM table1*
>>>>>>> * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time)
as
>>>>>>> int))) = '2012-07-09'*
>>>>>>> *    DISTRIBUTE BY buyer_id*
>>>>>>> *    SORT BY buyer_id, created_time desc*
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message