Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 78DE28E6C for ; Wed, 24 Aug 2011 17:43:02 +0000 (UTC) Received: (qmail 67453 invoked by uid 500); 24 Aug 2011 17:43:02 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 67380 invoked by uid 500); 24 Aug 2011 17:43:01 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 67372 invoked by uid 500); 24 Aug 2011 17:43:01 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 67369 invoked by uid 99); 24 Aug 2011 17:43:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Aug 2011 17:43:01 +0000 X-ASF-Spam-Status: No, hits=-2000.9 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Aug 2011 17:42:57 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C5803CDAA2 for ; Wed, 24 Aug 2011 17:42:35 +0000 (UTC) Date: Wed, 24 Aug 2011 17:42:35 +0000 (UTC) From: "jiraposter@reviews.apache.org (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <495916498.10021.1314207755805.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <781555448.54017.1313782467325.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HIVE-2397) Support with rollup option for group by MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HIVE-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090361#comment-13090361 ] jiraposter@reviews.apache.org commented on HIVE-2397: ----------------------------------------------------- ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1637/ ----------------------------------------------------------- Review request for hive, Ning Zhang and Siying Dong. Summary ------- If a user runs a query that includes group by ... with rollup, the behavior is like that of MySQL, see the task. I had to implement 4 different ways of providing this behavior to fit in with the 4 different ways of implementing group by depending on whether map aggregation is allowed, and whether the data is known to be skewed. If map aggregation is allowed, it is a simple matter of adding new keys with an increasing number of NULLs to the hash map to collect the data for the new rows as part of the map side hash aggregation. If map aggregation is not allowed and the data is not skewed, I perform a reduce job which performs a hash aggregation very similar to the way it is performed on the map side. I then perform a mergepartial reduce job to perform a final aggregation on the hash aggregation. If map aggregation is not allowed and the data is skewed and there are no distinct aggregations for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation. This was as simple as adding the new functionality to the map side hash aggregation. If map aggregation is not allowed and the data is skewed and there are distinct aggregations fro the group by, I perform a reduce job which performs a hash aggregation, and then use the same implementation that is used when there is no rollup option set. I have done my best not to detract from any optimizations that were made for each of the four different implementations of group by, but, r for the ones where I add a new reduce job, I am not sure how successful I was. Currently, the optimizations for multiple group bys is not supported for queries with the rollup option set, but I am continuing to look into this. This addresses bug HIVE-2397. https://issues.apache.org/jira/browse/HIVE-2397 Diffs ----- trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1160895 trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java 1160895 Diff: https://reviews.apache.org/r/1637/diff Testing ------- For each existing group by test that looked like it might be relevant, I added a corresponding test with the rollup option set. (I had found a number of bugs as I was implementing this with these tests, so I am satisfied that they are indeed relevant) I also added a new test which provides good coverage of having multiple group by keys with both distinct and non-distinct aggregations in each of the four different implementations with the rollup operator set. I also ran the original group by tests to verify I did not break anything. Thanks, Kevin > Support with rollup option for group by > --------------------------------------- > > Key: HIVE-2397 > URL: https://issues.apache.org/jira/browse/HIVE-2397 > Project: Hive > Issue Type: New Feature > Reporter: Kevin Wilfong > Assignee: Kevin Wilfong > > We should support the ROLLUP operator similar to the way MySQL is implemented. > Exerted from MySQL documents: > mysql> SELECT year, country, product, SUM(profit) > -> FROM sales > -> GROUP BY year, country, product WITH ROLLUP; > +------+---------+------------+-------------+ > | year | country | product | SUM(profit) | > +------+---------+------------+-------------+ > | 2000 | Finland | Computer | 1500 | > | 2000 | Finland | Phone | 100 | > | 2000 | Finland | NULL | 1600 | > | 2000 | India | Calculator | 150 | > | 2000 | India | Computer | 1200 | > | 2000 | India | NULL | 1350 | > | 2000 | USA | Calculator | 75 | > | 2000 | USA | Computer | 1500 | > | 2000 | USA | NULL | 1575 | > | 2000 | NULL | NULL | 4525 | > | 2001 | Finland | Phone | 10 | > | 2001 | Finland | NULL | 10 | > | 2001 | USA | Calculator | 50 | > | 2001 | USA | Computer | 2700 | > | 2001 | USA | TV | 250 | > | 2001 | USA | NULL | 3000 | > | 2001 | NULL | NULL | 3010 | > | NULL | NULL | NULL | 7535 | > +------+---------+------------+-------------+ > http://dev.mysql.com/doc/refman/5.0/en/group-by-modifiers.html -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira