hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zoltan Haindrich (JIRA)" <>
Subject [jira] [Commented] (HIVE-17617) Rollup of an empty resultset should contain the grouping of the empty grouping set
Date Tue, 10 Oct 2017 15:48:01 GMT


Zoltan Haindrich commented on HIVE-17617:

about how it worked earlier:

* in case of a simple {{select count(1) from x}} there is the implict {{()}} grouping.. in
which case only 1 reducer is spawned ... I don't think it would make sense to spawn any more
than one.
    ** the summary row was served by the Reducer based on that there were no inputrows and
it have been closed and there were no grouping keys.
* in case grouping sets: earlier when there were at least one input row which made thru the
Mapper; at the output it emitted 1 row for each grouping set
     ** if the () set was present; there were a grouping which collected those - and it just

however in case of grouping sets; it is possible that multiple reducers can effectively split
up the work... even in a simple case when there is one grouping field.

I'm afraid setting {{numReducers=1}} would possibly add some performance penalties; I will
peek into the code - and try to set it only if the empty grouping set is present.

> Rollup of an empty resultset should contain the grouping of the empty grouping set
> ----------------------------------------------------------------------------------
>                 Key: HIVE-17617
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>         Attachments: HIVE-17617.01.patch, HIVE-17617.03.patch, HIVE-17617.04.patch
> running
> {code}
> drop table if exists tx1;
> create table tx1 (a integer,b integer,c integer);
> select  sum(c),
>         grouping(b)
> from    tx1
> group by rollup (b);
> {code}
> returns 0 rows; however 
> according to the standard:
> The <empty grouping set> is regarded as the shortest such initial sublist. For
example, “ROLLUP ( (A, B), (C, D) )”
> is equivalent to “GROUPING SETS ( (A, B, C, D), (A, B), () )”.
> so I think the totals row (the grouping for {{()}} should be present)  - psql returns

This message was sent by Atlassian JIRA

View raw message