hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lefty Leverenz <leftylever...@gmail.com>
Subject Re: Semantics of Rank.
Date Tue, 03 Sep 2013 08:23:25 GMT
Another email thread led me to
HIVE-5038<https://issues.apache.org/jira/browse/HIVE-5038>("rank
operator is case-sensitive and has odd semantics") -- it's resolved
as invalid, but is that only for the odd semantics?

Perhaps this issue is clarified in more recent emails.  I'm catching up on
a huge backlog.

-- Lefty


On Tue, Sep 3, 2013 at 4:03 AM, Lefty Leverenz <leftyleverenz@gmail.com>wrote:

> What's the answer -- does the "rank" keyword have to be lowercase?
>
> If lowercase is obligatory we need to revise the wiki, which shows all
> uppercase (
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
> ).
>
> In the test files it's lowercase (windowing_rank.q, ptf_negative_WhereWithRankCond.q).
>  The patch for HIVE-896 shows a lowercase name in GenericUDAFRank.java but
> I don't know if that means lowercase is required:
>
> @WindowFunctionDescription
>>
>> (
>>
>> description = @Description(
>>
>> name = "rank",
>>
>> value = "_FUNC_(x)"
>>
>> ),
>>
>> supportsWindow = false,
>>
>> pivotResult = true
>>
>> )
>>
>
>
> And what about the other keywords in the wikidoc?  Same lowercase
> requirement?
>
> -- Lefty
>
>
> On Fri, Jul 26, 2013 at 5:30 PM, saurabh <mpp.databases@gmail.com> wrote:
>
>> Hi all,
>>
>> Below are some of observations based on the on-going rank function
>> discussion.
>>
>> 1. I executed below mentioned queries  and only the query with "rank"
>> (lowercase) executed successfully, rest were throwing exceptions "FAILED:
>> SemanticException Failed to breakup Windowing invocations into Groups."
>>
>> -  select cust_id, ord_dt, RANK() w from cust_ord window w as (partition
>> by cust_id order by ord_dt);
>>
>> -  select cust_id, ord_dt, Rank() w from cust_ord window w as (partition
>> by cust_id order by ord_dt);
>>
>> -   select cust_id, ord_dt, rank() w from cust_ord window w as (partition
>> by cust_id order by ord_dt);
>>
>> It seems "rank" keyword is case-sensitive. Attached is the screenshot
>> for reference.
>>
>> 2. I created a dummy table with the data provided in the below mail trail
>> and achieved the expected output, using the below mentioned query.
>>
>> *select cust_id, ord_dt, rank() over (partition by cust_id order by
>> ord_dt) from cust_ord;*
>>
>>  Request all to kindly review these details and suggest if it was of any
>> help!
>>
>> Thanks.
>>
>>
>> On Sat, Jul 27, 2013 at 12:07 AM, j.barrett Strausser <
>> j.barrett.strausser@gmail.com> wrote:
>>
>>> Any further help on this, otherwise I'll file a jira.
>>>
>>>
>>> On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser <
>>> j.barrett.strausser@gmail.com> wrote:
>>>
>>>> As an example : If I run my query above removing the arg the following
>>>> is thrown.
>>>>
>>>> FAILED: SemanticException Failed to breakup Windowing invocations into
>>>> Groups. At least 1 group must only depend on input columns. Also check for
>>>> circular dependencies.
>>>> Underlying error:
>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
>>>> arguments are expected.
>>>>
>>>>
>>>> Similar issue and fix here:
>>>>
>>>> http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html
>>>>
>>>> Even if it didn't require an arg it still doesn't explain my anomalous
>>>> output.
>>>>
>>>>
>>>>
>>>> On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser <
>>>> j.barrett.strausser@gmail.com> wrote:
>>>>
>>>>> That isn't true. If you try to run the above HIVE without an argument,
>>>>> it will throw an exception. I have seen other users replicate this problem
>>>>> as well.
>>>>>
>>>>> I can file a JIRA if someone can confirm that my query should work.
>>>>>
>>>>>
>>>>> On Wed, Jul 24, 2013 at 11:02 PM, manishbhoge@rocketmail.com <
>>>>> manishbhoge@rocketmail.com> wrote:
>>>>>
>>>>>> Analytical function doesn't expect any argument. Rank() itself enough
>>>>>> to sequence based on the window you have defined in partition by.
So
>>>>>>
>>>>>> Rank() over (partition by cmscustid  order by orderdate)
>>>>>>
>>>>>> Should work as long as I have wrote right syntax for hive.
>>>>>>
>>>>>> Sent via Rocket from my HTC
>>>>>>
>>>>>> ----- Reply message -----
>>>>>> From: "j.barrett Strausser" <j.barrett.strausser@gmail.com>
>>>>>> To: <user@hive.apache.org>
>>>>>> Subject: Semantics of Rank.
>>>>>> Date: Thu, Jul 25, 2013 1:08 AM
>>>>>>
>>>>>>
>>>>>> Thanks for the reply. Perhaps my misunderstanding of the relation
>>>>>> between
>>>>>> rank and the windowing function is wrong.
>>>>>>
>>>>>> What I want to achieve for the following is : For a given customer
id,
>>>>>> sort his orders. I thought the below would work.
>>>>>>
>>>>>> SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM order_data
>>>>>> eh
>>>>>> window w as (partition by cmscustid  order by orderdate);
>>>>>>
>>>>>> The rank function instead returns the rank of the order date over
all
>>>>>> all
>>>>>> order dates.
>>>>>>
>>>>>> Example snippet from above
>>>>>>
>>>>>> Actual :
>>>>>>
>>>>>> 6758783    27APR2012    94
>>>>>> 6758783    23JUN2012    95
>>>>>> 6758785    14DEC2012    96
>>>>>> 6758795    18DEC2011    97
>>>>>> 6758796    06MAY2012    98
>>>>>> 6758798    24MAR2013    99
>>>>>> 6758799    23NOV2012    100
>>>>>>
>>>>>>
>>>>>> Expected :
>>>>>>
>>>>>> 6758783    27APR2012    1
>>>>>> 6758783    23JUN2012    2
>>>>>> 6758785    14DEC2012    1
>>>>>> 6758795    18DEC2011    1
>>>>>> 6758796    06MAY2012    1
>>>>>> 6758798    24MAR2013    1
>>>>>> 6758799    23NOV2012    1
>>>>>>
>>>>>>
>>>>>> -b
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman <
>>>>>> sglixman@outbrain.com>wrote:
>>>>>>
>>>>>> > the argument to rank is simply some value, whereas the rank
function
>>>>>> > compare this value
>>>>>> >  to the previous value received, if value is same, rank returns
>>>>>> ++index,
>>>>>> > otherwise, rank return 1.
>>>>>> > pseudo code:
>>>>>> >
>>>>>> > class Rank {
>>>>>> > int index;
>>>>>> > Object previousValue = null;
>>>>>> > int evaluate(Object value) {
>>>>>> >   if (value == previousValue) {
>>>>>> >     return ++index;
>>>>>> >   }
>>>>>> >   previousValue = value;
>>>>>> >   index = 1;
>>>>>> >   return 1;
>>>>>> > }
>>>>>> >
>>>>>> >
>>>>>> > On Wed, Jul 24, 2013 at 9:59 PM, j.barrett Strausser <
>>>>>> > j.barrett.strausser@gmail.com> wrote:
>>>>>> >
>>>>>> >> It seems as though I am required to pass in an argument
to RANK().
>>>>>> >>
>>>>>> >> What is the effect of passing this argument in ?
>>>>>> >>
>>>>>> >> In a RANK function the output for RANK should be the number
of rows
>>>>>> >> preceding
>>>>>> >> a row in a given window. As  windows are specified by the
>>>>>> partition and
>>>>>> >> order by fields I don't understand the effect of passing
in an arg
>>>>>> to RANK
>>>>>> >>
>>>>>> >> Are the other non-aggregate function similar? Meaning :
Cume_dist,
>>>>>> Ntile
>>>>>> >> --
>>>>>> >>
>>>>>> >>
>>>>>> >> https://github.com/bearrito
>>>>>> >> @deepbearrito
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> > The above terms reflect a potential business arrangement, are
>>>>>> provided solely
>>>>>> > as a basis for further discussion, and are not intended to be
and
>>>>>> do not
>>>>>> > constitute a legally binding obligation. No legally binding
>>>>>> obligations will
>>>>>> > be created, implied, or inferred until an agreement in final
form
>>>>>> is executed
>>>>>> > in writing by all parties involved.
>>>>>> >
>>>>>> > This email and any attachments hereto may be confidential or
>>>>>> privileged.
>>>>>> >  If you received this communication by mistake, please don't
>>>>>> forward it
>>>>>> > to anyone else, please erase all copies and attachments, and
please
>>>>>> let
>>>>>> > me know that it has gone to the wrong person. Thanks.
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> https://github.com/bearrito
>>>>>> @deepbearrito
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> https://github.com/bearrito
>>>>> @deepbearrito
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> https://github.com/bearrito
>>>> @deepbearrito
>>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> https://github.com/bearrito
>>> @deepbearrito
>>>
>>
>>
>

Mime
View raw message