Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 18D5218BFC for ; Fri, 25 Sep 2015 13:15:13 +0000 (UTC) Received: (qmail 46074 invoked by uid 500); 25 Sep 2015 13:15:11 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 46005 invoked by uid 500); 25 Sep 2015 13:15:11 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 45995 invoked by uid 99); 25 Sep 2015 13:15:11 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Sep 2015 13:15:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 9563CC28A1 for ; Fri, 25 Sep 2015 13:15:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.98 X-Spam-Level: ** X-Spam-Status: No, score=2.98 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 3lcLNkRrImKe for ; Fri, 25 Sep 2015 13:15:02 +0000 (UTC) Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 74A4920F20 for ; Fri, 25 Sep 2015 13:15:02 +0000 (UTC) Received: by wicgb1 with SMTP id gb1so20064754wic.1 for ; Fri, 25 Sep 2015 06:15:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=MnHineReCVHRh9b6WWCKirVST1zHeLnLeVHd2tuos5M=; b=UvIimDHrum+s5moSMtXyfEmz/74WND4b9YQPnLRGToVZlaO3cKNcXa1SC0NJ9Conno yZlsSrsEGdfVyb3codEZJbPWY9f7XwLoXdSbnBYosg8D/trmezkLllWnl6lGvf0kx7l3 ZRPNPCALwuB9liMUAn/EXSreFTFQPDfQcnAtVpMC3RyTK2ExR16+HJmgvZS3bMILEPu5 ctMZbO7RibBzaLKqPbiHUEIWGLZsYvwV/M/xfW7B2Jf0t/gFEboVPIr8i9FjTlvENiid mZK8qAv52ZjZ64hn3eImSVe3sx2dm6h35R9xx72Athqc44SiqO4hqGFcOKfDitObU+Lg SzuQ== X-Gm-Message-State: ALoCoQnEwwwVtWjCjoJ5R0NO8TFaaXHKihXyjTiIAYXUF1m6BIhRHKC4xJa/q3dj0oCIseL46ex+ X-Received: by 10.180.85.164 with SMTP id i4mr3613240wiz.54.1443186901035; Fri, 25 Sep 2015 06:15:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.27.205.10 with HTTP; Fri, 25 Sep 2015 06:14:41 -0700 (PDT) From: Michal Krawczyk Date: Fri, 25 Sep 2015 13:14:41 +0000 Message-ID: Subject: How to use grouping__id in a query To: user@hive.apache.org Content-Type: multipart/alternative; boundary=f46d0445182f94cf05052092227d --f46d0445182f94cf05052092227d Content-Type: text/plain; charset=UTF-8 Hi all, During the migration from Hive 0.11 to 1.0 on Amazon EMR I run to an issue with grouping__id function. I'd like to use it to filter out NULL values that didn't come from grouping sets. Here's an example: We have a simple table with some data: hive> create table grouping_test (col1 string, col2 string); hive> insert into grouping_test values (1, 2), (1, 3), (1, null), (null, 2); hive> select * from grouping_test; OK 1 2 1 3 1 NULL NULL 2 hive> select col1, col2, GROUPING__ID, count(*) from grouping_test group by col1, col2 grouping sets ((), (col1)) having !(col1 IS NULL AND ((CAST(GROUPING__ID as int) & 1) > 0)) I expect the query above to filter out NULL col1 for the col1 grouping set, it used to work on Hive 0.11. But on Hive 1.0 it doesn't filter any values and still returns NULL col1: NULL NULL 0 4 NULL NULL 1 1 <=== this row is expected to be removed by the having clause 1 NULL 1 3 I tried also a few other conditions on grouping__id in having clause and none of them seem to work correctly: select col1, col2, GROUPING__ID, count(*) from grouping_test group by col1, col2 grouping sets ((), (col1)) having GROUPING__ID = '1' This query doesn't return any data. I also tried to embed it into a subquery, but still no luck. It finally worked when I saved the output of the main query to a temp table and filtered out the data using where clause, but this looks like an overkill. So my question is: How to filter out values using grouping__id in Hive 1.0? Thanks for your help, Michal -- Michal Krawczyk Project Manager / Tech Lead Union Square Internet Development http://www.u2i.com/ --f46d0445182f94cf05052092227d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi all,

During the migration from Hive = 0.11 to 1.0 on Amazon EMR I run to an issue with grouping__id function. I'd like to use it t= o filter out NULL values that didn't come from grouping sets. Here'= s an example:

We have a simple table with some dat= a:

hive> create table grouping_test (col1 = string, col2 string);
hive> insert into grouping_test values (= 1, 2), (1, 3), (1, null), (null, 2);
hive> select * from group= ing_test;
OK
1 =C2=A0 =C2=A0 =C2=A0 2
1 =C2= =A0 =C2=A0 =C2=A0 3
1 =C2=A0 =C2=A0 =C2=A0 NULL
NULL = =C2=A0 =C2=A02

hive> select col1, co= l2, GROUPING__ID, count(*)
from grouping_test
group by = col1, col2
grouping sets ((), (col1))
having !(col1 IS = NULL AND ((CAST(GROUPING__ID as int) & 1) > 0))

=
I expect the query above to filter out NULL col1 for the co= l1 grouping set, it used to work on Hive 0.11. But on Hive 1.0 it doesn'= ;t filter any values and still returns NULL col1:

NULL =C2=A0 =C2=A0NULL =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 4
NULL =C2=A0 =C2=A0NULL =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 <=3D=3D=3D this row is expected to be removed by th= e having clause
1 =C2=A0 =C2=A0 =C2=A0 NULL =C2=A0 =C2=A01 =C2=A0= =C2=A0 =C2=A0 3

I tried also a few other co= nditions on grouping__id in having clause and none of them seem to work cor= rectly:

select col1, col2, GROUPING__ID, coun= t(*)
from grouping_test
group by col1, col2
g= rouping sets ((), (col1))
having GROUPING__ID =3D '1'

This query doesn't return any data.


I also tried to embed it into a subquery, but still no luck. It= finally worked when I saved the output of the main query to a temp table a= nd filtered out the data using where clause, but this looks like an overkil= l.

So my question is: How to filter out value= s using grouping__id in Hive 1.0?

Thanks for your = help,
Michal


--
Michal = Krawczyk
Project Manager / Tech Lead
Union Square Internet Dev= elopment
http://www.u2= i.com/
--f46d0445182f94cf05052092227d--