Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Wed, 12 Dec 2012 15:19:21 +0000 (UTC)
From: "Namit Jain (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12610869.1349729889817.23439.1355325561921@arcas>
In-Reply-To: <JIRA.12610869.1349729889817@arcas>
References: <JIRA.12610869.1349729889817@arcas>
Subject: [jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for
 performing cubes/rollups/grouping sets for a high number of grouping set
 keys
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530021#comment-13530021 ] 

Namit Jain commented on HIVE-3552:
----------------------------------

comments addressed + tests passed
                
> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3552
>                 URL: https://issues.apache.org/jira/browse/HIVE-3552
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira