hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Takiar (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-11133) Support hive.explain.user for Spark
Date Fri, 21 Apr 2017 22:09:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979385#comment-15979385
] 

Sahil Takiar edited comment on HIVE-11133 at 4/21/17 10:08 PM:
---------------------------------------------------------------

[~xuefuz]

The query:

{code}
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.strict.checks.cartesian.product=false;
set hive.spark.explain.user=true;
set hive.spark.dynamic.partition.pruning=true;

EXPLAIN select count(*) from srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart
union all select min(srcpart.ds) from srcpart);
{code}

Prints

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Reducer 10 <- Map 9 (GROUP)
Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
Reducer 13 <- Map 12 (GROUP)

Vertex dependency in root stage
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
Reducer 3 <- Reducer 2 (GROUP)
Reducer 5 <- Map 4 (GROUP)
Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
Reducer 8 <- Map 7 (GROUP)

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 3
      File Output Operator [FS_34]
        Group By Operator [GBY_32] (rows=1 width=8)
          Output:["_col0"],aggregations:["count(VALUE._col0)"]
        <-Reducer 2 [GROUP]
          GROUP [RS_31]
            Group By Operator [GBY_30] (rows=1 width=8)
              Output:["_col0"],aggregations:["count()"]
              Join Operator [JOIN_28] (rows=2200 width=10)
                condition map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
              <-Map 1 [PARTITION-LEVEL SORT]
                PARTITION-LEVEL SORT [RS_26]
                  PartitionCols:_col0
                  Select Operator [SEL_2] (rows=2000 width=10)
                    Output:["_col0"]
                    TableScan [TS_0] (rows=2000 width=10)
                      default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
              <-Reducer 6 [PARTITION-LEVEL SORT]
                PARTITION-LEVEL SORT [RS_27]
                  PartitionCols:_col0
                  Group By Operator [GBY_24] (rows=1 width=184)
                    Output:["_col0"],keys:KEY._col0
                  <-Reducer 5 [GROUP]
                    GROUP [RS_23]
                      PartitionCols:_col0
                      Group By Operator [GBY_22] (rows=2 width=184)
                        Output:["_col0"],keys:_col0
                        Filter Operator [FIL_9] (rows=1 width=184)
                          predicate:_col0 is not null
                          Group By Operator [GBY_7] (rows=1 width=184)
                            Output:["_col0"],aggregations:["max(VALUE._col0)"]
                          <-Map 4 [GROUP]
                            GROUP [RS_6]
                              Group By Operator [GBY_5] (rows=1 width=184)
                                Output:["_col0"],aggregations:["max(ds)"]
                                Select Operator [SEL_4] (rows=2000 width=10)
                                  Output:["ds"]
                                  TableScan [TS_3] (rows=2000 width=10)
                                    default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
                  <-Reducer 8 [GROUP]
                    GROUP [RS_23]
                      PartitionCols:_col0
                      Group By Operator [GBY_22] (rows=2 width=184)
                        Output:["_col0"],keys:_col0
                        Filter Operator [FIL_17] (rows=1 width=184)
                          predicate:_col0 is not null
                          Group By Operator [GBY_15] (rows=1 width=184)
                            Output:["_col0"],aggregations:["min(VALUE._col0)"]
                          <-Map 7 [GROUP]
                            GROUP [RS_14]
                              Group By Operator [GBY_13] (rows=1 width=184)
                                Output:["_col0"],aggregations:["min(ds)"]
                                Select Operator [SEL_12] (rows=2000 width=10)
                                  Output:["ds"]
                                  TableScan [TS_11] (rows=2000 width=10)
                                    default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
        Stage-2
          Reducer 11
{code}

So there are two sections that say {{Vertex dependency in root stage}}. I haven't checked
to see if this is possible with Hive-on-Tez, but it looks like an existing bug in the user-level
explain code.


was (Author: stakiar):
[~xuefuz]

The query:

{code}
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.strict.checks.cartesian.product=false;
set hive.spark.explain.user=true;
set hive.spark.dynamic.partition.pruning=true;

EXPLAIN select count(*) from srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart
union all select min(srcpart.ds) from srcpart);
{code}

Prints

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Reducer 10 <- Map 9 (GROUP)
Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
Reducer 13 <- Map 12 (GROUP)

Vertex dependency in root stage
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
Reducer 3 <- Reducer 2 (GROUP)
Reducer 5 <- Map 4 (GROUP)
Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
Reducer 8 <- Map 7 (GROUP)

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 3
      File Output Operator [FS_34]
        Group By Operator [GBY_32] (rows=1 width=8)
          Output:["_col0"],aggregations:["count(VALUE._col0)"]
        <-Reducer 2 [GROUP]
          GROUP [RS_31]
            Group By Operator [GBY_30] (rows=1 width=8)
              Output:["_col0"],aggregations:["count()"]
              Join Operator [JOIN_28] (rows=2200 width=10)
                condition map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
              <-Map 1 [PARTITION-LEVEL SORT]
                PARTITION-LEVEL SORT [RS_26]
                  PartitionCols:_col0
                  Select Operator [SEL_2] (rows=2000 width=10)
                    Output:["_col0"]
                    TableScan [TS_0] (rows=2000 width=10)
                      default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
              <-Reducer 6 [PARTITION-LEVEL SORT]
                PARTITION-LEVEL SORT [RS_27]
                  PartitionCols:_col0
                  Group By Operator [GBY_24] (rows=1 width=184)
                    Output:["_col0"],keys:KEY._col0
                  <-Reducer 5 [GROUP]
                    GROUP [RS_23]
                      PartitionCols:_col0
                      Group By Operator [GBY_22] (rows=2 width=184)
                        Output:["_col0"],keys:_col0
                        Filter Operator [FIL_9] (rows=1 width=184)
                          predicate:_col0 is not null
                          Group By Operator [GBY_7] (rows=1 width=184)
                            Output:["_col0"],aggregations:["max(VALUE._col0)"]
                          <-Map 4 [GROUP]
                            GROUP [RS_6]
                              Group By Operator [GBY_5] (rows=1 width=184)
                                Output:["_col0"],aggregations:["max(ds)"]
                                Select Operator [SEL_4] (rows=2000 width=10)
                                  Output:["ds"]
                                  TableScan [TS_3] (rows=2000 width=10)
                                    default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
                  <-Reducer 8 [GROUP]
                    GROUP [RS_23]
                      PartitionCols:_col0
                      Group By Operator [GBY_22] (rows=2 width=184)
                        Output:["_col0"],keys:_col0
                        Filter Operator [FIL_17] (rows=1 width=184)
                          predicate:_col0 is not null
                          Group By Operator [GBY_15] (rows=1 width=184)
                            Output:["_col0"],aggregations:["min(VALUE._col0)"]
                          <-Map 7 [GROUP]
                            GROUP [RS_14]
                              Group By Operator [GBY_13] (rows=1 width=184)
                                Output:["_col0"],aggregations:["min(ds)"]
                                Select Operator [SEL_12] (rows=2000 width=10)
                                  Output:["ds"]
                                  TableScan [TS_11] (rows=2000 width=10)
                                    default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
        Stage-2
          Reducer 11
{code}

So there are two sections that say {{Vertex dependency in root stage}}. I haven't checked
to see if this is possible with Hive-on-Tez, but it looks like an existing bug in the user-level
explain code.

> Support hive.explain.user for Spark
> -----------------------------------
>
>                 Key: HIVE-11133
>                 URL: https://issues.apache.org/jira/browse/HIVE-11133
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Mohit Sabharwal
>            Assignee: Sahil Takiar
>         Attachments: HIVE-11133.1.patch, HIVE-11133.2.patch, HIVE-11133.3.patch, HIVE-11133.4.patch,
HIVE-11133.5.patch, HIVE-11133.6.patch, HIVE-11133.7.patch, HIVE-11133.8.patch
>
>
> User friendly explain output ({{set hive.explain.user=true}}) should support Spark as
well. 
> Once supported, we should also enable related q-tests like {{explainuser_1.q}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message