hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Krawczyk <michal.krawc...@u2i.com>
Subject Re: How to use grouping__id in a query
Date Tue, 27 Oct 2015 15:44:43 GMT
Thanks Jesus, sorry for delayed response. Looking forward to the fix ;).

On Wed, Oct 21, 2015 at 6:03 PM, Jesus Camacho Rodriguez <
jcamachorodriguez@hortonworks.com> wrote:

> I created HIVE-12223 to track this issue.
>
> Thanks,
> Jesús
>
>
> From: Jesus Camachorodriguez
> Reply-To: "user@hive.apache.org"
> Date: Friday, October 16, 2015 at 8:00 AM
>
> To: "user@hive.apache.org"
> Subject: Re: How to use grouping__id in a query
>
> Hi Michal,
>
> Sorry I didn't catch your message before. The change of behavior might be
> due to a bug; certainly we should filter or at least produce a proper error.
>
> Could you file a JIRA case and assign it to me? I'll check further.
>
> Thanks,
> Jesús
>
>
>
> From: Michal Krawczyk
> Reply-To: "user@hive.apache.org"
> Date: Friday, October 16, 2015 at 8:15 AM
> To: "user@hive.apache.org"
> Subject: Re: How to use grouping__id in a query
>
> Hi all,
>
> Unfortunately I didn't get any answer on this one, perhaps I asked the
> question incorrectly. I'll try another one then ;).
>
> Should it be possible to use grouping__id function in having clause to
> filter our null values in the same query. It used to work in Hive 0.11 and
> 0.13, but doesn't work in Hive 1.0.
>
> Thanks,
> Michal
>
> On Fri, Sep 25, 2015 at 1:14 PM, Michal Krawczyk <michal.krawczyk@u2i.com>
> wrote:
>
>> Hi all,
>>
>> During the migration from Hive 0.11 to 1.0 on Amazon EMR I run to an
>> issue with grouping__id function. I'd like to use it to filter out NULL
>> values that didn't come from grouping sets. Here's an example:
>>
>> We have a simple table with some data:
>>
>> hive> create table grouping_test (col1 string, col2 string);
>> hive> insert into grouping_test values (1, 2), (1, 3), (1, null), (null,
>> 2);
>> hive> select * from grouping_test;
>> OK
>> 1       2
>> 1       3
>> 1       NULL
>> NULL    2
>>
>> hive> select col1, col2, GROUPING__ID, count(*)
>> from grouping_test
>> group by col1, col2
>> grouping sets ((), (col1))
>> having !(col1 IS NULL AND ((CAST(GROUPING__ID as int) & 1) > 0))
>>
>> I expect the query above to filter out NULL col1 for the col1 grouping
>> set, it used to work on Hive 0.11. But on Hive 1.0 it doesn't filter any
>> values and still returns NULL col1:
>>
>> NULL    NULL    0       4
>> NULL    NULL    1       1         <=== this row is expected to be removed
>> by the having clause
>> 1       NULL    1       3
>>
>> I tried also a few other conditions on grouping__id in having clause and
>> none of them seem to work correctly:
>>
>> select col1, col2, GROUPING__ID, count(*)
>> from grouping_test
>> group by col1, col2
>> grouping sets ((), (col1))
>> having GROUPING__ID = '1'
>>
>> This query doesn't return any data.
>>
>>
>> I also tried to embed it into a subquery, but still no luck. It finally
>> worked when I saved the output of the main query to a temp table and
>> filtered out the data using where clause, but this looks like an overkill.
>>
>> So my question is: How to filter out values using grouping__id in Hive
>> 1.0?
>>
>> Thanks for your help,
>> Michal
>>
>>
>> --
>> Michal Krawczyk
>> Project Manager / Tech Lead
>> Union Square Internet Development
>> http://www.u2i.com/
>>
>
>
>
> --
> Michal Krawczyk
> Project Manager / Tech Lead
> Union Square Internet Development
> http://www.u2i.com/
>



-- 
Michal Krawczyk
Project Manager / Tech Lead
Union Square Internet Development
http://www.u2i.com/

Mime
View raw message