hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bing Li via Review Board <nore...@reviews.apache.org>
Subject Review Request 60632: HIVE-16659: Query plan should reflect hive.spark.use.groupby.shuffle
Date Tue, 04 Jul 2017 08:48:56 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60632/
-----------------------------------------------------------

Review request for hive.


Repository: hive-git


Description
-------

HIVE-16659: Query plan should reflect hive.spark.use.groupby.shuffle


Diffs
-----

  itests/src/test/resources/testconfiguration.properties 19ff316 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RepartitionShuffler.java d0c708c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 5f85f9e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java b9901da 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java afbeccb 
  ql/src/test/queries/clientpositive/spark_explain_groupbyshuffle.q PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_explain_groupbyshuffle.q.out PRE-CREATION



Diff: https://reviews.apache.org/r/60632/diff/1/


Testing
-------

set hive.spark.use.groupby.shuffle=true;
explain select key, count(val) from t1 group by key;

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Spark
      Edges:
        Reducer 2 <- Map 1 (GROUP, 2)
      DagName: root_20170625202742_58335619-7107-4026-9911-43d2ec449088:2
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: t1
                  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats:
NONE
                  Select Operator
                    expressions: key (type: int), val (type: string)
                    outputColumnNames: key, val
                    Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats:
NONE
                    Group By Operator
                      aggregations: count(val)
                      keys: key (type: int)
                      mode: hash
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column
stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column
stats: NONE
                        value expressions: _col1 (type: bigint)
        Reducer 2
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats:
NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats:
NONE
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink


set hive.spark.use.groupby.shuffle=false;
explain select key, count(val) from t1 group by key;

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Spark
      Edges:
        Reducer 2 <- Map 1 (GROUP, 2)
      DagName: root_20170625203122_3afe01dd-41cc-477e-9098-ddd58b37ad4e:3
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: t1
                  Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats:
NONE
                  Select Operator
                    expressions: key (type: int), val (type: string)
                    outputColumnNames: key, val
                    Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats:
NONE
                    Group By Operator
                      aggregations: count(val)
                      keys: key (type: int)
                      mode: hash
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column
stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column
stats: NONE
                        value expressions: _col1 (type: bigint)
        Reducer 2
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats:
NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats:
NONE
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink


Thanks,

Bing Li


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message