hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "j.barrett Strausser" <j.barrett.straus...@gmail.com>
Subject Re: Semantics of Rank.
Date Tue, 03 Sep 2013 13:15:44 GMT
I filed HIVE-5038. My issue was that it seemed incorrect that rank took an
arg. I haven't verified that the issue is fixed in trunk as claimed in the
JIRA.However, between the reponse in JIRA and in this thread it seems
resolved.


On Tue, Sep 3, 2013 at 4:23 AM, Lefty Leverenz <leftyleverenz@gmail.com>wrote:

> Another email thread led me to HIVE-5038<https://issues.apache.org/jira/browse/HIVE-5038>("rank
operator is case-sensitive and has odd semantics") -- it's resolved
> as invalid, but is that only for the odd semantics?
>
> Perhaps this issue is clarified in more recent emails.  I'm catching up on
> a huge backlog.
>
> -- Lefty
>
>
> On Tue, Sep 3, 2013 at 4:03 AM, Lefty Leverenz <leftyleverenz@gmail.com>wrote:
>
>> What's the answer -- does the "rank" keyword have to be lowercase?
>>
>> If lowercase is obligatory we need to revise the wiki, which shows all
>> uppercase (
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
>> ).
>>
>> In the test files it's lowercase (windowing_rank.q, ptf_negative_WhereWithRankCond.q).
>>  The patch for HIVE-896 shows a lowercase name in GenericUDAFRank.java but
>> I don't know if that means lowercase is required:
>>
>>  @WindowFunctionDescription
>>>
>>> (
>>>
>>> description = @Description(
>>>
>>> name = "rank",
>>>
>>> value = "_FUNC_(x)"
>>>
>>> ),
>>>
>>> supportsWindow = false,
>>>
>>> pivotResult = true
>>>
>>> )
>>>
>>
>>
>> And what about the other keywords in the wikidoc?  Same lowercase
>> requirement?
>>
>> -- Lefty
>>
>>
>> On Fri, Jul 26, 2013 at 5:30 PM, saurabh <mpp.databases@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Below are some of observations based on the on-going rank function
>>> discussion.
>>>
>>> 1. I executed below mentioned queries  and only the query with "rank"
>>> (lowercase) executed successfully, rest were throwing exceptions "FAILED:
>>> SemanticException Failed to breakup Windowing invocations into Groups."
>>>
>>> -  select cust_id, ord_dt, RANK() w from cust_ord window w as (partition
>>> by cust_id order by ord_dt);
>>>
>>> -  select cust_id, ord_dt, Rank() w from cust_ord window w as (partition
>>> by cust_id order by ord_dt);
>>>
>>> -   select cust_id, ord_dt, rank() w from cust_ord window w as
>>> (partition by cust_id order by ord_dt);
>>>
>>> It seems "rank" keyword is case-sensitive. Attached is the screenshot
>>> for reference.
>>>
>>> 2. I created a dummy table with the data provided in the below mail
>>> trail and achieved the expected output, using the below mentioned query.
>>>
>>> *select cust_id, ord_dt, rank() over (partition by cust_id order by
>>> ord_dt) from cust_ord;*
>>>
>>>  Request all to kindly review these details and suggest if it was of any
>>> help!
>>>
>>> Thanks.
>>>
>>>
>>> On Sat, Jul 27, 2013 at 12:07 AM, j.barrett Strausser <
>>> j.barrett.strausser@gmail.com> wrote:
>>>
>>>> Any further help on this, otherwise I'll file a jira.
>>>>
>>>>
>>>> On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser <
>>>> j.barrett.strausser@gmail.com> wrote:
>>>>
>>>>> As an example : If I run my query above removing the arg the following
>>>>> is thrown.
>>>>>
>>>>> FAILED: SemanticException Failed to breakup Windowing invocations into
>>>>> Groups. At least 1 group must only depend on input columns. Also check
for
>>>>> circular dependencies.
>>>>> Underlying error:
>>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
>>>>> arguments are expected.
>>>>>
>>>>>
>>>>> Similar issue and fix here:
>>>>>
>>>>> http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html
>>>>>
>>>>> Even if it didn't require an arg it still doesn't explain my anomalous
>>>>> output.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser <
>>>>> j.barrett.strausser@gmail.com> wrote:
>>>>>
>>>>>> That isn't true. If you try to run the above HIVE without an
>>>>>> argument, it will throw an exception. I have seen other users replicate
>>>>>> this problem as well.
>>>>>>
>>>>>> I can file a JIRA if someone can confirm that my query should work.
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 24, 2013 at 11:02 PM, manishbhoge@rocketmail.com <
>>>>>> manishbhoge@rocketmail.com> wrote:
>>>>>>
>>>>>>> Analytical function doesn't expect any argument. Rank() itself
>>>>>>> enough to sequence based on the window you have defined in partition
by. So
>>>>>>>
>>>>>>> Rank() over (partition by cmscustid  order by orderdate)
>>>>>>>
>>>>>>> Should work as long as I have wrote right syntax for hive.
>>>>>>>
>>>>>>> Sent via Rocket from my HTC
>>>>>>>
>>>>>>> ----- Reply message -----
>>>>>>> From: "j.barrett Strausser" <j.barrett.strausser@gmail.com>
>>>>>>> To: <user@hive.apache.org>
>>>>>>> Subject: Semantics of Rank.
>>>>>>> Date: Thu, Jul 25, 2013 1:08 AM
>>>>>>>
>>>>>>>
>>>>>>> Thanks for the reply. Perhaps my misunderstanding of the relation
>>>>>>> between
>>>>>>> rank and the windowing function is wrong.
>>>>>>>
>>>>>>> What I want to achieve for the following is : For a given customer
>>>>>>> id,
>>>>>>> sort his orders. I thought the below would work.
>>>>>>>
>>>>>>> SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM
>>>>>>> order_data eh
>>>>>>> window w as (partition by cmscustid  order by orderdate);
>>>>>>>
>>>>>>> The rank function instead returns the rank of the order date
over
>>>>>>> all all
>>>>>>> order dates.
>>>>>>>
>>>>>>> Example snippet from above
>>>>>>>
>>>>>>> Actual :
>>>>>>>
>>>>>>> 6758783    27APR2012    94
>>>>>>> 6758783    23JUN2012    95
>>>>>>> 6758785    14DEC2012    96
>>>>>>> 6758795    18DEC2011    97
>>>>>>> 6758796    06MAY2012    98
>>>>>>> 6758798    24MAR2013    99
>>>>>>> 6758799    23NOV2012    100
>>>>>>>
>>>>>>>
>>>>>>> Expected :
>>>>>>>
>>>>>>> 6758783    27APR2012    1
>>>>>>> 6758783    23JUN2012    2
>>>>>>> 6758785    14DEC2012    1
>>>>>>> 6758795    18DEC2011    1
>>>>>>> 6758796    06MAY2012    1
>>>>>>> 6758798    24MAR2013    1
>>>>>>> 6758799    23NOV2012    1
>>>>>>>
>>>>>>>
>>>>>>> -b
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman <
>>>>>>> sglixman@outbrain.com>wrote:
>>>>>>>
>>>>>>> > the argument to rank is simply some value, whereas the rank
>>>>>>> function
>>>>>>> > compare this value
>>>>>>> >  to the previous value received, if value is same, rank
returns
>>>>>>> ++index,
>>>>>>> > otherwise, rank return 1.
>>>>>>> > pseudo code:
>>>>>>> >
>>>>>>> > class Rank {
>>>>>>> > int index;
>>>>>>> > Object previousValue = null;
>>>>>>> > int evaluate(Object value) {
>>>>>>> >   if (value == previousValue) {
>>>>>>> >     return ++index;
>>>>>>> >   }
>>>>>>> >   previousValue = value;
>>>>>>> >   index = 1;
>>>>>>> >   return 1;
>>>>>>> > }
>>>>>>> >
>>>>>>> >
>>>>>>> > On Wed, Jul 24, 2013 at 9:59 PM, j.barrett Strausser <
>>>>>>> > j.barrett.strausser@gmail.com> wrote:
>>>>>>> >
>>>>>>> >> It seems as though I am required to pass in an argument
to RANK().
>>>>>>> >>
>>>>>>> >> What is the effect of passing this argument in ?
>>>>>>> >>
>>>>>>> >> In a RANK function the output for RANK should be the
number of
>>>>>>> rows
>>>>>>> >> preceding
>>>>>>> >> a row in a given window. As  windows are specified by
the
>>>>>>> partition and
>>>>>>> >> order by fields I don't understand the effect of passing
in an
>>>>>>> arg to RANK
>>>>>>> >>
>>>>>>> >> Are the other non-aggregate function similar? Meaning
:
>>>>>>> Cume_dist, Ntile
>>>>>>> >> --
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> https://github.com/bearrito
>>>>>>> >> @deepbearrito
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> > The above terms reflect a potential business arrangement,
are
>>>>>>> provided solely
>>>>>>> > as a basis for further discussion, and are not intended
to be and
>>>>>>> do not
>>>>>>> > constitute a legally binding obligation. No legally binding
>>>>>>> obligations will
>>>>>>> > be created, implied, or inferred until an agreement in final
form
>>>>>>> is executed
>>>>>>> > in writing by all parties involved.
>>>>>>> >
>>>>>>> > This email and any attachments hereto may be confidential
or
>>>>>>> privileged.
>>>>>>> >  If you received this communication by mistake, please don't
>>>>>>> forward it
>>>>>>> > to anyone else, please erase all copies and attachments,
and
>>>>>>> please let
>>>>>>> > me know that it has gone to the wrong person. Thanks.
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/bearrito
>>>>>>> @deepbearrito
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> https://github.com/bearrito
>>>>>> @deepbearrito
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> https://github.com/bearrito
>>>>> @deepbearrito
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> https://github.com/bearrito
>>>> @deepbearrito
>>>>
>>>
>>>
>>
>


-- 


https://github.com/bearrito
@deepbearrito

Mime
View raw message