hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Navis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9000) LAST_VALUE Window function returns wrong results
Date Wed, 17 Dec 2014 05:52:13 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249520#comment-14249520
] 

Navis commented on HIVE-9000:
-----------------------------

[~mgrover] I think it's correct. You can acquire expected result by,
{code}
select t, s, i, last_value(i) over (partition by t order by s RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) 
from over10k where (s = 'oscar allen' or s = 'oscar carson') and t = 10;
{code}

> LAST_VALUE Window function returns wrong results
> ------------------------------------------------
>
>                 Key: HIVE-9000
>                 URL: https://issues.apache.org/jira/browse/HIVE-9000
>             Project: Hive
>          Issue Type: Bug
>          Components: PTF-Windowing
>    Affects Versions: 0.13.1
>            Reporter: Mark Grover
>            Priority: Critical
>             Fix For: 0.14.1
>
>
> LAST_VALUE Windowing function has been returning bad results, as far as I can tell from
day 1.
> And, it seems like the tests are also asserting that LAST_VALUE gives the wrong result.
> Here's the test output:
> https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587
> The query is:
> {code}
> select t, s, i, last_value(i) over (partition by t order by s) from over10k where (s
= 'oscar allen' or s = 'oscar carson') and t = 10
> {code}
> The result is:
> {code}
> t              s                    i          last_value(i)
> -------------------------------------------------------
> 10	oscar allen	65662	65662
> 10	oscar carson	65549	65549
> {code}
> {{LAST_VALUE( i )}} should have returned 65549 in both records, instead it simply ends
up returning i.
> Another way you can make sure LAST_VALUE is bad is to verify it's result against LEAD(i,1)
over (partition by t order by s). LAST_VALUE being last value should always be more (in terms
of the specified 'order by s') than the lead by 1. While this doesn't directly apply to the
above query, if the result set had more rows, you would clearly see records where lead is
higher than last_value which is semantically incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message