hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lefty Leverenz <leftylever...@gmail.com>
Subject Re: Semantics of Rank.
Date Tue, 24 Sep 2013 09:29:22 GMT
Thanks, I've added a note about case-insensitivity to the UDF doc and the
Tutorial:

   - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
   -
   https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Builtinoperatorsandfunctions

-- Lefty


On Tue, Sep 3, 2013 at 7:38 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> Beyond that answer please note:
> Udf names are always case insensitive.  RaNk = RANK = rank
> For rank this is a special case that was fixed.
>
>
> On Tue, Sep 3, 2013 at 9:15 AM, j.barrett Strausser <
> j.barrett.strausser@gmail.com> wrote:
>
>> I filed HIVE-5038. My issue was that it seemed incorrect that rank took
>> an arg. I haven't verified that the issue is fixed in trunk as claimed in
>> the JIRA.However, between the reponse in JIRA and in this thread it seems
>> resolved.
>>
>>
>> On Tue, Sep 3, 2013 at 4:23 AM, Lefty Leverenz <leftyleverenz@gmail.com>wrote:
>>
>>> Another email thread led me to HIVE-5038<https://issues.apache.org/jira/browse/HIVE-5038>("rank
operator is case-sensitive and has odd semantics") -- it's resolved
>>> as invalid, but is that only for the odd semantics?
>>>
>>> Perhaps this issue is clarified in more recent emails.  I'm catching up
>>> on a huge backlog.
>>>
>>> -- Lefty
>>>
>>>
>>> On Tue, Sep 3, 2013 at 4:03 AM, Lefty Leverenz <leftyleverenz@gmail.com>wrote:
>>>
>>>> What's the answer -- does the "rank" keyword have to be lowercase?
>>>>
>>>> If lowercase is obligatory we need to revise the wiki, which shows all
>>>> uppercase (
>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
>>>> ).
>>>>
>>>> In the test files it's lowercase (windowing_rank.q, ptf_negative_WhereWithRankCond.q).
>>>>  The patch for HIVE-896 shows a lowercase name in GenericUDAFRank.java but
>>>> I don't know if that means lowercase is required:
>>>>
>>>>  @WindowFunctionDescription
>>>>>
>>>>> (
>>>>>
>>>>> description = @Description(
>>>>>
>>>>> name = "rank",
>>>>>
>>>>> value = "_FUNC_(x)"
>>>>>
>>>>> ),
>>>>>
>>>>> supportsWindow = false,
>>>>>
>>>>> pivotResult = true
>>>>>
>>>>> )
>>>>>
>>>>
>>>>
>>>> And what about the other keywords in the wikidoc?  Same lowercase
>>>> requirement?
>>>>
>>>> -- Lefty
>>>>
>>>>
>>>> On Fri, Jul 26, 2013 at 5:30 PM, saurabh <mpp.databases@gmail.com>wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Below are some of observations based on the on-going rank function
>>>>> discussion.
>>>>>
>>>>> 1. I executed below mentioned queries  and only the query with "rank"
>>>>> (lowercase) executed successfully, rest were throwing exceptions "FAILED:
>>>>> SemanticException Failed to breakup Windowing invocations into Groups."
>>>>>
>>>>> -  select cust_id, ord_dt, RANK() w from cust_ord window w as
>>>>> (partition by cust_id order by ord_dt);
>>>>>
>>>>> -  select cust_id, ord_dt, Rank() w from cust_ord window w as
>>>>> (partition by cust_id order by ord_dt);
>>>>>
>>>>> -   select cust_id, ord_dt, rank() w from cust_ord window w as
>>>>> (partition by cust_id order by ord_dt);
>>>>>
>>>>> It seems "rank" keyword is case-sensitive. Attached is the screenshot
>>>>> for reference.
>>>>>
>>>>> 2. I created a dummy table with the data provided in the below mail
>>>>> trail and achieved the expected output, using the below mentioned query.
>>>>>
>>>>> *select cust_id, ord_dt, rank() over (partition by cust_id order by
>>>>> ord_dt) from cust_ord;*
>>>>>
>>>>>  Request all to kindly review these details and suggest if it was of
>>>>> any help!
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> On Sat, Jul 27, 2013 at 12:07 AM, j.barrett Strausser <
>>>>> j.barrett.strausser@gmail.com> wrote:
>>>>>
>>>>>> Any further help on this, otherwise I'll file a jira.
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser <
>>>>>> j.barrett.strausser@gmail.com> wrote:
>>>>>>
>>>>>>> As an example : If I run my query above removing the arg the
>>>>>>> following is thrown.
>>>>>>>
>>>>>>> FAILED: SemanticException Failed to breakup Windowing invocations
>>>>>>> into Groups. At least 1 group must only depend on input columns.
Also check
>>>>>>> for circular dependencies.
>>>>>>> Underlying error:
>>>>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One
or more
>>>>>>> arguments are expected.
>>>>>>>
>>>>>>>
>>>>>>> Similar issue and fix here:
>>>>>>>
>>>>>>> http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html
>>>>>>>
>>>>>>> Even if it didn't require an arg it still doesn't explain my
>>>>>>> anomalous output.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser <
>>>>>>> j.barrett.strausser@gmail.com> wrote:
>>>>>>>
>>>>>>>> That isn't true. If you try to run the above HIVE without
an
>>>>>>>> argument, it will throw an exception. I have seen other users
replicate
>>>>>>>> this problem as well.
>>>>>>>>
>>>>>>>> I can file a JIRA if someone can confirm that my query should
work.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 24, 2013 at 11:02 PM, manishbhoge@rocketmail.com
<
>>>>>>>> manishbhoge@rocketmail.com> wrote:
>>>>>>>>
>>>>>>>>> Analytical function doesn't expect any argument. Rank()
itself
>>>>>>>>> enough to sequence based on the window you have defined
in partition by. So
>>>>>>>>>
>>>>>>>>> Rank() over (partition by cmscustid  order by orderdate)
>>>>>>>>>
>>>>>>>>> Should work as long as I have wrote right syntax for
hive.
>>>>>>>>>
>>>>>>>>> Sent via Rocket from my HTC
>>>>>>>>>
>>>>>>>>> ----- Reply message -----
>>>>>>>>> From: "j.barrett Strausser" <j.barrett.strausser@gmail.com>
>>>>>>>>> To: <user@hive.apache.org>
>>>>>>>>> Subject: Semantics of Rank.
>>>>>>>>> Date: Thu, Jul 25, 2013 1:08 AM
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for the reply. Perhaps my misunderstanding of
the relation
>>>>>>>>> between
>>>>>>>>> rank and the windowing function is wrong.
>>>>>>>>>
>>>>>>>>> What I want to achieve for the following is : For a given
customer
>>>>>>>>> id,
>>>>>>>>> sort his orders. I thought the below would work.
>>>>>>>>>
>>>>>>>>> SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w
FROM
>>>>>>>>> order_data eh
>>>>>>>>> window w as (partition by cmscustid  order by orderdate);
>>>>>>>>>
>>>>>>>>> The rank function instead returns the rank of the order
date over
>>>>>>>>> all all
>>>>>>>>> order dates.
>>>>>>>>>
>>>>>>>>> Example snippet from above
>>>>>>>>>
>>>>>>>>> Actual :
>>>>>>>>>
>>>>>>>>> 6758783    27APR2012    94
>>>>>>>>> 6758783    23JUN2012    95
>>>>>>>>> 6758785    14DEC2012    96
>>>>>>>>> 6758795    18DEC2011    97
>>>>>>>>> 6758796    06MAY2012    98
>>>>>>>>> 6758798    24MAR2013    99
>>>>>>>>> 6758799    23NOV2012    100
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Expected :
>>>>>>>>>
>>>>>>>>> 6758783    27APR2012    1
>>>>>>>>> 6758783    23JUN2012    2
>>>>>>>>> 6758785    14DEC2012    1
>>>>>>>>> 6758795    18DEC2011    1
>>>>>>>>> 6758796    06MAY2012    1
>>>>>>>>> 6758798    24MAR2013    1
>>>>>>>>> 6758799    23NOV2012    1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -b
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman <
>>>>>>>>> sglixman@outbrain.com>wrote:
>>>>>>>>>
>>>>>>>>> > the argument to rank is simply some value, whereas
the rank
>>>>>>>>> function
>>>>>>>>> > compare this value
>>>>>>>>> >  to the previous value received, if value is same,
rank returns
>>>>>>>>> ++index,
>>>>>>>>> > otherwise, rank return 1.
>>>>>>>>> > pseudo code:
>>>>>>>>> >
>>>>>>>>> > class Rank {
>>>>>>>>> > int index;
>>>>>>>>> > Object previousValue = null;
>>>>>>>>> > int evaluate(Object value) {
>>>>>>>>> >   if (value == previousValue) {
>>>>>>>>> >     return ++index;
>>>>>>>>> >   }
>>>>>>>>> >   previousValue = value;
>>>>>>>>> >   index = 1;
>>>>>>>>> >   return 1;
>>>>>>>>> > }
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Wed, Jul 24, 2013 at 9:59 PM, j.barrett Strausser
<
>>>>>>>>> > j.barrett.strausser@gmail.com> wrote:
>>>>>>>>> >
>>>>>>>>> >> It seems as though I am required to pass in
an argument to
>>>>>>>>> RANK().
>>>>>>>>> >>
>>>>>>>>> >> What is the effect of passing this argument
in ?
>>>>>>>>> >>
>>>>>>>>> >> In a RANK function the output for RANK should
be the number of
>>>>>>>>> rows
>>>>>>>>> >> preceding
>>>>>>>>> >> a row in a given window. As  windows are specified
by the
>>>>>>>>> partition and
>>>>>>>>> >> order by fields I don't understand the effect
of passing in an
>>>>>>>>> arg to RANK
>>>>>>>>> >>
>>>>>>>>> >> Are the other non-aggregate function similar?
Meaning :
>>>>>>>>> Cume_dist, Ntile
>>>>>>>>> >> --
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> https://github.com/bearrito
>>>>>>>>> >> @deepbearrito
>>>>>>>>> >>
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > The above terms reflect a potential business arrangement,
are
>>>>>>>>> provided solely
>>>>>>>>> > as a basis for further discussion, and are not intended
to be
>>>>>>>>> and do not
>>>>>>>>> > constitute a legally binding obligation. No legally
binding
>>>>>>>>> obligations will
>>>>>>>>> > be created, implied, or inferred until an agreement
in final
>>>>>>>>> form is executed
>>>>>>>>> > in writing by all parties involved.
>>>>>>>>> >
>>>>>>>>> > This email and any attachments hereto may be confidential
or
>>>>>>>>> privileged.
>>>>>>>>> >  If you received this communication by mistake,
please don't
>>>>>>>>> forward it
>>>>>>>>> > to anyone else, please erase all copies and attachments,
and
>>>>>>>>> please let
>>>>>>>>> > me know that it has gone to the wrong person. Thanks.
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/bearrito
>>>>>>>>> @deepbearrito
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/bearrito
>>>>>>>> @deepbearrito
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/bearrito
>>>>>>> @deepbearrito
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> https://github.com/bearrito
>>>>>> @deepbearrito
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>>
>>
>> https://github.com/bearrito
>> @deepbearrito
>>
>
>

Mime
View raw message