hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Krawczyk <michal.krawc...@u2i.com>
Subject Re: Possible bug with max() together with rank() and grouping sets
Date Wed, 22 Oct 2014 07:39:44 GMT
Not sure. The issue you mentioned requires specifying additional columns,
whereas the one I mentioned return obviously incorrect results, which seems
to be much more severe issue.

Can anybody try to replicate this? If it's really the case on non Amazon
Hive I'll send a bug report on Jira.

On Tue, Oct 21, 2014 at 4:01 PM, j.barrett Strausser <
j.barrett.strausser@gmail.com> wrote:

> Perhaps related to https://issues.apache.org/jira/browse/HIVE-4663
>
> I ran across similar issues in .11 not sure if that above ticket affects
> .13
>
> On Tue, Oct 21, 2014 at 8:21 AM, Michal Krawczyk <michal.krawczyk@u2i.com>
> wrote:
>
>> Hi all,
>>
>> Recently I've run into a problem with incorrect results in one of the
>> queries on our system after upgrade from Hive 0.8.1.4 to 0.13.1. We use
>> Amazon Elastic Map Reduce servivce on Amazon. I tried to simplify the
>> original query and replicate this issue on a small dataset. Please take a
>> look at the queries below and let me know what are your thoughts.
>>
>> I have the following table:
>> CREATE  TABLE `t`(
>>   `category` int,
>>   `live` int,
>>   `comments` int)
>>
>> with the following data:
>> hive> select * from t;
>> OK
>> 3       0       2
>> 2       0       2
>> 8       0       2
>>
>> The query:
>> hive> select category, max(live) live, max(comments) comments, rank()
>> OVER (PARTITION BY category ORDER BY comments) rank1
>> FROM t
>> GROUP BY category
>> GROUPING SETS ((), (category))
>> HAVING max(comments) > 0;
>>
>> return the following results:
>>
>> NULL    1       48      1
>> 2       1       49      1
>> 3       1       49      1
>> 8       1       49      1
>>
>> Long story short when using grouping sets with the rank() function the
>> max() function return incorrect results. Everything works fine if I remove
>> grouping sets clause and split the query into two independent queries or
>> remove the rank() function.
>>
>> This looks like a bug to me but please review. That said, I'm not sure if
>> it's just Amazon issue or general Hive issue.
>>
>> Thanks,
>> Michal
>>
>> --
>> Michal Krawczyk
>> Project Manager / Tech Lead
>> Union Square Internet Development
>> http://www.u2i.com/
>>
>
>
>
> --
>
>
> https://github.com/bearrito
> @deepbearrito
>



-- 
Michal Krawczyk
Project Manager / Tech Lead
Union Square Internet Development
http://www.u2i.com/

Mime
View raw message