hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Wilfong" <kevinwilf...@fb.com>
Subject Re: Review Request: Support with rollup option for group by
Date Wed, 07 Sep 2011 01:34:25 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1637/
-----------------------------------------------------------

(Updated 2011-09-07 01:34:25.870854)


Review request for hive, Yongqiang He, Ning Zhang, and Siying Dong.


Summary
-------

If a user runs a query that includes group by ... with rollup, the behavior is like that of
MySQL, see the task.

I had to implement 4 different ways of providing this behavior to fit in with the 4 different
ways of implementing group by depending on whether map aggregation is allowed, and whether
the data is known to be skewed.

If map aggregation is allowed, it is a simple matter of adding new keys with an increasing
number of NULLs to the hash map to collect the data for the new rows as part of the map side
hash aggregation.

If map aggregation is not allowed and the data is not skewed, I perform a reduce job which
performs a hash aggregation very similar to the way it is performed on the map side.  I then
perform a mergepartial reduce job to perform a final aggregation on the hash aggregation.

If map aggregation is not allowed and the data is skewed and there are no distinct aggregations
for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation.
 This was as simple as adding the new functionality to the map side hash aggregation.

If map aggregation is not allowed and the data is skewed and there are distinct aggregations
fro the group by, I perform a reduce job which performs  a hash aggregation, and then use
the same implementation that is used when there is no rollup option set.

I have done my best not to detract from any optimizations that were made for each of the four
different implementations of group by, but, r for the ones where I add a new reduce job, I
am not sure how successful I was.

Currently, the optimizations for multiple group bys is not supported for queries with the
rollup option set, but I am continuing to look into this.


This addresses bug HIVE-2397.
    https://issues.apache.org/jira/browse/HIVE-2397


Diffs
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1160895 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1160895 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1160895 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1160895 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1160895 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1160895 
  trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION

  trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION

  trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION

  trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION

  trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION

  trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION

  trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION

  trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION

  trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION

  trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION

  trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION

  trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION

  trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java
1160895 

Diff: https://reviews.apache.org/r/1637/diff


Testing
-------

For each existing group by test that looked like it might be relevant, I added a corresponding
test with the rollup option set.  (I had found a number of bugs as I was implementing this
with these tests, so I am satisfied that they are indeed relevant)

I also added a new test which provides good coverage of having multiple group by keys with
both distinct and non-distinct aggregations in each of the four different implementations
with the rollup operator set.

I also ran the original group by tests to verify I did not break anything.


Thanks,

Kevin


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message