Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Content-Type: multipart/alternative;
 boundary="===============2044482480282087879=="
MIME-Version: 1.0
Subject: Review Request 27955: HIVE-8842 - auto_join2.q produces incorrect
 tree
 [Spark Branch]
From: "Chao Sun" <chao.sun@cloudera.com>
To: "Szehon Ho" <szehon@cloudera.com>, "Xuefu Zhang" <xzhang@cloudera.com>
Cc: "hive" <dev@hive.apache.org>, "Chao Sun" <chao.sun@cloudera.com>
Date: Thu, 13 Nov 2014 02:29:29 -0000
Message-ID: <20141113022929.9703.43625@reviews.apache.org>
Auto-Submitted: auto-generated
Sender: "Chao Sun" <noreply@reviews.apache.org>
Reply-To: "Chao Sun" <chao.sun@cloudera.com>

--===============2044482480282087879==
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27955/
-----------------------------------------------------------

Review request for hive, Szehon Ho and Xuefu Zhang.


Bugs: HIVE-8842
    https://issues.apache.org/jira/browse/HIVE-8842


Repository: hive-git


Description
-------

Enabling the SparkMapJoinResolver and SparkReduceSinkMapJoinProc, I see the following:
explain select * from src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src src3 ON (src1.key + src2.key = src3.key);
Enabling the SparkMapJoinResolver and SparkReduceSinkMapJoinProc, I see the following:

{noformat}
explain select * from src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src src3 ON (src1.key + src2.key = src3.key);
{noformat}

produces too many stages (six), and too many HashTableSink.
{noformat}
STAGE DEPENDENCIES:
  Stage-5 is a root stage
  Stage-4 depends on stages: Stage-5
  Stage-3 depends on stages: Stage-4
  Stage-7 is a root stage
  Stage-6 depends on stages: Stage-7
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-5
    Spark
      DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: src2
                  Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE
                    HashTable Sink Operator
                      condition expressions:
                        0 {key} {value}
                        1 {key} {value}
                      keys:
                        0 key (type: string)
                        1 key (type: string)

  Stage: Stage-4
    Spark
      DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2
      Vertices:
        Map 3 
            Map Operator Tree:
                TableScan
                  alias: src1
                  Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 {key} {value}
                        1 {key} {value}
                      keys:
                        0 key (type: string)
                        1 key (type: string)
                      outputColumnNames: _col0, _col1, _col5, _col6
                      input vertices:
                        1 Map 1
                      Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE
                      Filter Operator
                        predicate: (_col0 + _col5) is not null (type: boolean)
                        Statistics: Num rows: 8 Data size: 1653 Basic stats: COMPLETE Column stats: NONE
                        HashTable Sink Operator
                          condition expressions:
                            0 {_col0} {_col1} {_col5} {_col6}
                            1 {key} {value}
                          keys:
                            0 (_col0 + _col5) (type: double)
                            1 UDFToDouble(key) (type: double)

  Stage: Stage-3
    Spark
      DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:1
      Vertices:
        Map 2 
            Map Operator Tree:
                TableScan
                  alias: src3
                  Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: UDFToDouble(key) is not null (type: boolean)
                    Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 {_col0} {_col1} {_col5} {_col6}
                        1 {key} {value}
                      keys:
                        0 (_col0 + _col5) (type: double)
                        1 UDFToDouble(key) (type: double)
                      outputColumnNames: _col0, _col1, _col5, _col6, _col10, _col11
                      input vertices:
                        0 Map 3
                      Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE
                      Select Operator
                        expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string), _col6 (type: string), _col10 (type: string), _col11 (type: string)
                        outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                        Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE
                          table:
                              input format: org.apache.hadoop.mapred.TextInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-7
    Spark
      DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: src2
                  Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE
                    HashTable Sink Operator
                      condition expressions:
                        0 {key} {value}
                        1 {key} {value}
                      keys:
                        0 key (type: string)
                        1 key (type: string)

  Stage: Stage-6
    Spark
      DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2
      Vertices:
        Map 3 
            Map Operator Tree:
                TableScan
                  alias: src1
                  Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 {key} {value}
                        1 {key} {value}
                      keys:
                        0 key (type: string)
                        1 key (type: string)
                      outputColumnNames: _col0, _col1, _col5, _col6
                      input vertices:
                        1 Map 1
                      Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE
                      Filter Operator
                        predicate: (_col0 + _col5) is not null (type: boolean)
                        Statistics: Num rows: 8 Data size: 1653 Basic stats: COMPLETE Column stats: NONE
                        HashTable Sink Operator
                          condition expressions:
                            0 {_col0} {_col1} {_col5} {_col6}
                            1 {key} {value}
                          keys:
                            0 (_col0 + _col5) (type: double)
                            1 UDFToDouble(key) (type: double)

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{noformat}


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java a8b7ac6 

Diff: https://reviews.apache.org/r/27955/diff/


Testing
-------


Thanks,

Chao Sun


--===============2044482480282087879==--