Mailing-List: contact issues-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Mon, 23 Oct 2017 05:25:00 +0000 (UTC)
From: "liyunzhang_intel (JIRA)" <jira@apache.org>
To: issues@hive.apache.org
Message-ID: <JIRA.13090679.1501212507000.31291.1508736300106@Atlassian.JIRA>
In-Reply-To: <JIRA.13090679.1501212507000@Atlassian.JIRA>
References: <JIRA.13090679.1501212507000@Atlassian.JIRA> <JIRA.13090679.1501212507790@jira-lw-us.apache.org>
Subject: [jira] [Comment Edited] (HIVE-17193) HoS: don't combine map works
 that are targets of different DPPs
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 23 Oct 2017 05:25:06 -0000


    [ https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214644#comment-16214644 ] 

liyunzhang_intel edited comment on HIVE-17193 at 10/23/17 5:24 AM:
-------------------------------------------------------------------

I can reproduce after disabling cbo
{code}

set hive.explain.user=false;
set hive.spark.dynamic.partition.pruning=true;
set hive.tez.dynamic.partition.pruning=true;
set hive.auto.convert.join=false;
set hive.cbo.enable=false;
explain
select * from
  (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a
join
  (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) b
on a.key=b.key;
{code}

the explain
{code}
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
    Spark
      DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:2
      Vertices:
        Map 8 
            Map Operator Tree:
                TableScan
                  alias: src
                  Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: string)
                      outputColumnNames: _col0
                      Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                      Group By Operator
                        keys: _col0 (type: string)
                        mode: hash
                        outputColumnNames: _col0
                        Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                        Spark Partition Pruning Sink Operator
                          Target column: ds (string)
                          partition key expr: ds
                          Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                          target work: Map 1

  Stage: Stage-1
    Spark
      Edges:
        Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL SORT, 1)
        Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 1), Reducer 6 (PARTITION-LEVEL SORT, 1)
        Reducer 6 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 7 (PARTITION-LEVEL SORT, 1)
      DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: srcpart
                  Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: ds (type: string)
                      sort order: +
                      Map-reduce partition columns: ds (type: string)
                      Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
                      value expressions: key (type: string)
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: src
                  Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: key (type: string)
                      sort order: +
                      Map-reduce partition columns: key (type: string)
                      Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
        Map 7 
            Map Operator Tree:
                TableScan
                  alias: src
                  Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: value is not null (type: boolean)
                    Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: value (type: string)
                      sort order: +
                      Map-reduce partition columns: value (type: string)
                      Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
        Reducer 2 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 ds (type: string)
                  1 key (type: string)
                outputColumnNames: _col0, _col2
                Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col2 (type: string), _col0 (type: string)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col1 (type: string)
                    sort order: +
                    Map-reduce partition columns: _col1 (type: string)
                    Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col0 (type: string)
        Reducer 3 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 _col1 (type: string)
                  1 _col1 (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 280 Data size: 28129 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 280 Data size: 28129 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 6 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 ds (type: string)
                  1 value (type: string)
                outputColumnNames: _col0, _col2
                Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col2 (type: string), _col0 (type: string)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col1 (type: string)
                    sort order: +
                    Map-reduce partition columns: _col1 (type: string)
                    Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col0 (type: string)

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

{code}

There is only 1 Map about srcpart. The reason why the maps about srcpart can not be merged when enabling cbo is because the RS in Maps are considered different while they are considered same when disabling cbo(see attached [picture|https://issues.apache.org/jira/secure/attachment/12893484/17193_compare_RS_in_Map_5_1.PNG])


was (Author: kellyzly):
I can reproduce after disabling cbo
{code}

set hive.explain.user=false;
set hive.spark.dynamic.partition.pruning=true;
set hive.tez.dynamic.partition.pruning=true;
set hive.auto.convert.join=false;
set hive.cbo.enable=false;
explain
select * from
  (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a
join
  (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) b
on a.key=b.key;
{code}

the explain
{code}
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
    Spark
      DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:2
      Vertices:
        Map 8 
            Map Operator Tree:
                TableScan
                  alias: src
                  Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: string)
                      outputColumnNames: _col0
                      Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                      Group By Operator
                        keys: _col0 (type: string)
                        mode: hash
                        outputColumnNames: _col0
                        Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                        Spark Partition Pruning Sink Operator
                          Target column: ds (string)
                          partition key expr: ds
                          Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                          target work: Map 1

  Stage: Stage-1
    Spark
      Edges:
        Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL SORT, 1)
        Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 1), Reducer 6 (PARTITION-LEVEL SORT, 1)
        Reducer 6 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 7 (PARTITION-LEVEL SORT, 1)
      DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: srcpart
                  Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: ds (type: string)
                      sort order: +
                      Map-reduce partition columns: ds (type: string)
                      Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
                      value expressions: key (type: string)
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: src
                  Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: key (type: string)
                      sort order: +
                      Map-reduce partition columns: key (type: string)
                      Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
        Map 7 
            Map Operator Tree:
                TableScan
                  alias: src
                  Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: value is not null (type: boolean)
                    Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: value (type: string)
                      sort order: +
                      Map-reduce partition columns: value (type: string)
                      Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
        Reducer 2 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 ds (type: string)
                  1 key (type: string)
                outputColumnNames: _col0, _col2
                Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col2 (type: string), _col0 (type: string)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col1 (type: string)
                    sort order: +
                    Map-reduce partition columns: _col1 (type: string)
                    Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col0 (type: string)
        Reducer 3 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 _col1 (type: string)
                  1 _col1 (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 280 Data size: 28129 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 280 Data size: 28129 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 6 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 ds (type: string)
                  1 value (type: string)
                outputColumnNames: _col0, _col2
                Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col2 (type: string), _col0 (type: string)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col1 (type: string)
                    sort order: +
                    Map-reduce partition columns: _col1 (type: string)
                    Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col0 (type: string)

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

{code}

There is only 1 Map about srcpart. The reason why the maps about srcpart can not be merged when enabling cbo is because the RS in Maps are considered different while they are considered same when disabling cbo(see attached picture)

> HoS: don't combine map works that are targets of different DPPs
> ---------------------------------------------------------------
>
>                 Key: HIVE-17193
>                 URL: https://issues.apache.org/jira/browse/HIVE-17193
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>            Assignee: Rui Li
>
> Suppose {{srcpart}} is partitioned by {{ds}}. The following query can trigger the issue:
> {code}
> explain
> select * from
>   (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a
> join
>   (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) b
> on a.key=b.key;
> {code}


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)