Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
MIME-Version: 1.0
From: Michal Krawczyk <michal.krawczyk@u2i.com>
Date: Fri, 25 Sep 2015 13:14:41 +0000
Message-ID: 
 <CAKftWoMvZDuJ+9iqC6ty80aw0r=-idisEem1GWSROdS_uY+8Mw@mail.gmail.com>
Subject: How to use grouping__id in a query
To: user@hive.apache.org
Content-Type: multipart/alternative; boundary=f46d0445182f94cf05052092227d

--f46d0445182f94cf05052092227d
Content-Type: text/plain; charset=UTF-8

Hi all,

During the migration from Hive 0.11 to 1.0 on Amazon EMR I run to an issue
with grouping__id function. I'd like to use it to filter out NULL values
that didn't come from grouping sets. Here's an example:

We have a simple table with some data:

hive> create table grouping_test (col1 string, col2 string);
hive> insert into grouping_test values (1, 2), (1, 3), (1, null), (null, 2);
hive> select * from grouping_test;
OK
1       2
1       3
1       NULL
NULL    2

hive> select col1, col2, GROUPING__ID, count(*)
from grouping_test
group by col1, col2
grouping sets ((), (col1))
having !(col1 IS NULL AND ((CAST(GROUPING__ID as int) & 1) > 0))

I expect the query above to filter out NULL col1 for the col1 grouping set,
it used to work on Hive 0.11. But on Hive 1.0 it doesn't filter any values
and still returns NULL col1:

NULL    NULL    0       4
NULL    NULL    1       1         <=== this row is expected to be removed
by the having clause
1       NULL    1       3

I tried also a few other conditions on grouping__id in having clause and
none of them seem to work correctly:

select col1, col2, GROUPING__ID, count(*)
from grouping_test
group by col1, col2
grouping sets ((), (col1))
having GROUPING__ID = '1'

This query doesn't return any data.


I also tried to embed it into a subquery, but still no luck. It finally
worked when I saved the output of the main query to a temp table and
filtered out the data using where clause, but this looks like an overkill.

So my question is: How to filter out values using grouping__id in Hive 1.0?

Thanks for your help,
Michal


-- 
Michal Krawczyk
Project Manager / Tech Lead
Union Square Internet Development
http://www.u2i.com/

--f46d0445182f94cf05052092227d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi all,<div><br></div><div>During the migration from Hive =
0.11 to 1.0 on Amazon <span class=3D"" id=3D":bbg.1" tabindex=3D"-1">EMR</s=
pan> I run to an issue with grouping__id function. I&#39;d like to use it t=
o filter out NULL values that didn&#39;t come from grouping sets. Here&#39;=
s an example:</div><div><br></div><div>We have a simple table with some dat=
a:</div><div><br></div><div><div>hive&gt; create table grouping_test (col1 =
string, col2 string);</div><div>hive&gt; insert into grouping_test values (=
1, 2), (1, 3), (1, null), (null, 2);</div><div>hive&gt; select * from group=
ing_test;</div><div>OK</div><div>1 =C2=A0 =C2=A0 =C2=A0 2</div><div>1 =C2=
=A0 =C2=A0 =C2=A0 3</div><div>1 =C2=A0 =C2=A0 =C2=A0 NULL</div><div>NULL =
=C2=A0 =C2=A02</div></div><div><br></div><div><div>hive&gt; select col1, co=
l2, GROUPING__ID, count(*)</div><div>from grouping_test</div><div>group by =
col1, col2</div><div>grouping sets ((), (col1))</div><div>having !(col1 IS =
NULL AND ((CAST(GROUPING__ID as int) &amp; 1) &gt; 0))</div></div><div><br>=
</div><div><div>I expect the query above to filter out NULL col1 for the co=
l1 grouping set, it used to work on Hive 0.11. But on Hive 1.0 it doesn&#39=
;t filter any values and still returns NULL col1:</div></div><div><br></div=
><div><div>NULL =C2=A0 =C2=A0NULL =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 4</di=
v><div>NULL =C2=A0 =C2=A0NULL =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 &lt;=3D=3D=3D this row is expected to be removed by th=
e having clause</div><div>1 =C2=A0 =C2=A0 =C2=A0 NULL =C2=A0 =C2=A01 =C2=A0=
 =C2=A0 =C2=A0 3</div></div><div><br></div><div>I tried also a few other co=
nditions on grouping__id in having clause and none of them seem to work cor=
rectly:</div><div><br></div><div><div>select col1, col2, GROUPING__ID, coun=
t(*)</div><div>from grouping_test</div><div>group by col1, col2</div><div>g=
rouping sets ((), (col1))</div><div>having GROUPING__ID =3D &#39;1&#39;</di=
v></div><div><br></div><div>This query doesn&#39;t return any data.</div><d=
iv><br></div><div><br></div><div>I also tried to embed it into a <span clas=
s=3D"" id=3D":bbg.3" tabindex=3D"-1">subquery</span>, but still no luck. It=
 finally worked when I saved the output of the main query to a temp table a=
nd filtered out the data using where clause, but this looks like an overkil=
l.</div><div><br></div><div><div>So my question is: How to filter out value=
s using grouping__id in Hive 1.0?</div><div><br></div><div>Thanks for your =
help,</div><div>Michal</div><div><br></div><div><br></div>-- <br><div class=
=3D"gmail_signature">Michal <span class=3D"" id=3D":bbg.4" tabindex=3D"-1">=
Krawczyk</span><br>Project Manager / Tech Lead<br>Union Square Internet Dev=
elopment<br><a href=3D"http://www.u2i.com/" target=3D"_blank">http://www.u2=
i.com/</a></div>
</div></div>

--f46d0445182f94cf05052092227d--