hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Gurumurthi (JIRA)" <>
Subject [jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
Date Sat, 31 Jan 2015 19:58:35 GMT


Arun Gurumurthi commented on HIVE-3552:

Hi Namit,

This functionality is great.

Will there be further enhancement to allow following :
a) Rollup & Cube to allow format such as -

current format : group by a, b, c with rollup

New formats : 
a) group by rollup(a, b, c) --> this will give output same as current format
b) group by rollup((a, b), c) --> this will give output as (a,b,c) / (a,b) / total
c) group by a rollup((b, c), d) --> this will give output as (a, b, c, d) / (a,b,c) , a

similar functionality with CUBE

These allow us to use rollup instead of specifying to many combinations with grouping sets
when we do not want all combinations but only selective but it is still too many to specify
in grouping sets.

This functionality is available in RDBMS.

Another request : If we were to use cube and filter only the sets we need instead of all combinations,
i was trying to use groouping__ID to filter only specific sets and it does not return any

select a,b,c, grouping__ID, count(*)
from tableA
group by a,b,c, with cube
having grouping__ID = 2


> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number
of grouping set keys
> -------------------------------------------------------------------------------------------------------------
>                 Key: HIVE-3552
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.11.0
>         Attachments: hive.3552.1.patch, hive.3552.10.patch, hive.3552.11.patch, hive.3552.12.patch,
hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch,
hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the order of
> grouping keys which has a higher probability of hitting the hash table.

This message was sent by Atlassian JIRA

View raw message