hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From saurabh <mpp.databa...@gmail.com>
Subject Re: Semantics of Rank.
Date Fri, 26 Jul 2013 21:30:35 GMT
Hi all,

Below are some of observations based on the on-going rank function
discussion.

1. I executed below mentioned queries  and only the query with "rank"
(lowercase) executed successfully, rest were throwing exceptions "FAILED:
SemanticException Failed to breakup Windowing invocations into Groups."

-  select cust_id, ord_dt, RANK() w from cust_ord window w as (partition by
cust_id order by ord_dt);

-  select cust_id, ord_dt, Rank() w from cust_ord window w as (partition by
cust_id order by ord_dt);

-   select cust_id, ord_dt, rank() w from cust_ord window w as (partition
by cust_id order by ord_dt);

It seems "rank" keyword is case-sensitive. Attached is the screenshot for
reference.

2. I created a dummy table with the data provided in the below mail trail
and achieved the expected output, using the below mentioned query.

*select cust_id, ord_dt, rank() over (partition by cust_id order by ord_dt)
from cust_ord;*

 Request all to kindly review these details and suggest if it was of any
help!

Thanks.


On Sat, Jul 27, 2013 at 12:07 AM, j.barrett Strausser <
j.barrett.strausser@gmail.com> wrote:

> Any further help on this, otherwise I'll file a jira.
>
>
> On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser <
> j.barrett.strausser@gmail.com> wrote:
>
>> As an example : If I run my query above removing the arg the following is
>> thrown.
>>
>> FAILED: SemanticException Failed to breakup Windowing invocations into
>> Groups. At least 1 group must only depend on input columns. Also check for
>> circular dependencies.
>> Underlying error:
>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
>> arguments are expected.
>>
>>
>> Similar issue and fix here:
>>
>> http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html
>>
>> Even if it didn't require an arg it still doesn't explain my anomalous
>> output.
>>
>>
>>
>> On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser <
>> j.barrett.strausser@gmail.com> wrote:
>>
>>> That isn't true. If you try to run the above HIVE without an argument,
>>> it will throw an exception. I have seen other users replicate this problem
>>> as well.
>>>
>>> I can file a JIRA if someone can confirm that my query should work.
>>>
>>>
>>> On Wed, Jul 24, 2013 at 11:02 PM, manishbhoge@rocketmail.com <
>>> manishbhoge@rocketmail.com> wrote:
>>>
>>>> Analytical function doesn't expect any argument. Rank() itself enough
>>>> to sequence based on the window you have defined in partition by. So
>>>>
>>>> Rank() over (partition by cmscustid  order by orderdate)
>>>>
>>>> Should work as long as I have wrote right syntax for hive.
>>>>
>>>> Sent via Rocket from my HTC
>>>>
>>>> ----- Reply message -----
>>>> From: "j.barrett Strausser" <j.barrett.strausser@gmail.com>
>>>> To: <user@hive.apache.org>
>>>> Subject: Semantics of Rank.
>>>> Date: Thu, Jul 25, 2013 1:08 AM
>>>>
>>>>
>>>> Thanks for the reply. Perhaps my misunderstanding of the relation
>>>> between
>>>> rank and the windowing function is wrong.
>>>>
>>>> What I want to achieve for the following is : For a given customer id,
>>>> sort his orders. I thought the below would work.
>>>>
>>>> SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM order_data eh
>>>> window w as (partition by cmscustid  order by orderdate);
>>>>
>>>> The rank function instead returns the rank of the order date over all
>>>> all
>>>> order dates.
>>>>
>>>> Example snippet from above
>>>>
>>>> Actual :
>>>>
>>>> 6758783    27APR2012    94
>>>> 6758783    23JUN2012    95
>>>> 6758785    14DEC2012    96
>>>> 6758795    18DEC2011    97
>>>> 6758796    06MAY2012    98
>>>> 6758798    24MAR2013    99
>>>> 6758799    23NOV2012    100
>>>>
>>>>
>>>> Expected :
>>>>
>>>> 6758783    27APR2012    1
>>>> 6758783    23JUN2012    2
>>>> 6758785    14DEC2012    1
>>>> 6758795    18DEC2011    1
>>>> 6758796    06MAY2012    1
>>>> 6758798    24MAR2013    1
>>>> 6758799    23NOV2012    1
>>>>
>>>>
>>>> -b
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman <sglixman@outbrain.com
>>>> >wrote:
>>>>
>>>> > the argument to rank is simply some value, whereas the rank function
>>>> > compare this value
>>>> >  to the previous value received, if value is same, rank returns
>>>> ++index,
>>>> > otherwise, rank return 1.
>>>> > pseudo code:
>>>> >
>>>> > class Rank {
>>>> > int index;
>>>> > Object previousValue = null;
>>>> > int evaluate(Object value) {
>>>> >   if (value == previousValue) {
>>>> >     return ++index;
>>>> >   }
>>>> >   previousValue = value;
>>>> >   index = 1;
>>>> >   return 1;
>>>> > }
>>>> >
>>>> >
>>>> > On Wed, Jul 24, 2013 at 9:59 PM, j.barrett Strausser <
>>>> > j.barrett.strausser@gmail.com> wrote:
>>>> >
>>>> >> It seems as though I am required to pass in an argument to RANK().
>>>> >>
>>>> >> What is the effect of passing this argument in ?
>>>> >>
>>>> >> In a RANK function the output for RANK should be the number of rows
>>>> >> preceding
>>>> >> a row in a given window. As  windows are specified by the partition
>>>> and
>>>> >> order by fields I don't understand the effect of passing in an arg
>>>> to RANK
>>>> >>
>>>> >> Are the other non-aggregate function similar? Meaning : Cume_dist,
>>>> Ntile
>>>> >> --
>>>> >>
>>>> >>
>>>> >> https://github.com/bearrito
>>>> >> @deepbearrito
>>>> >>
>>>> >
>>>> >
>>>> > The above terms reflect a potential business arrangement, are
>>>> provided solely
>>>> > as a basis for further discussion, and are not intended to be and do
>>>> not
>>>> > constitute a legally binding obligation. No legally binding
>>>> obligations will
>>>> > be created, implied, or inferred until an agreement in final form is
>>>> executed
>>>> > in writing by all parties involved.
>>>> >
>>>> > This email and any attachments hereto may be confidential or
>>>> privileged.
>>>> >  If you received this communication by mistake, please don't forward
>>>> it
>>>> > to anyone else, please erase all copies and attachments, and please
>>>> let
>>>> > me know that it has gone to the wrong person. Thanks.
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> https://github.com/bearrito
>>>> @deepbearrito
>>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> https://github.com/bearrito
>>> @deepbearrito
>>>
>>
>>
>>
>> --
>>
>>
>> https://github.com/bearrito
>> @deepbearrito
>>
>
>
>
> --
>
>
> https://github.com/bearrito
> @deepbearrito
>

Mime
View raw message