hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7985) With CBO enabled cross product is generated when a subquery is present
Date Wed, 17 Sep 2014 19:13:33 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137768#comment-14137768
] 

Mostafa Mokhtar commented on HIVE-7985:
---------------------------------------

Q56 also has the same issues 
Query 
{code}
with ss as (
 select i_item_id,sum(ss_ext_sales_price) total_sales
 from
        store_sales,
        date_dim,
         customer_address,
         item
 where item.i_item_id in (select
     i.i_item_id
from item i
where i_color in ('purple','burlywood','indian'))
 and     ss_item_sk              = i_item_sk
 and     ss_sold_date_sk         = d_date_sk
 and     d_year                  = 2001
 and     d_moy                   = 1
 and     ss_addr_sk              = ca_address_sk
 and     ca_gmt_offset           = -6 
 group by i_item_id),
 cs as (
 select i_item_id,sum(cs_ext_sales_price) total_sales
 from
        catalog_sales,
        date_dim,
         customer_address,
         item
 where
         item.i_item_id               in (select
  i.i_item_id
from item i
where i_color in ('purple','burlywood','indian'))
 and     cs_item_sk              = i_item_sk
 and     cs_sold_date_sk         = d_date_sk
 and     d_year                  = 2001
 and     d_moy                   = 1
 and     cs_bill_addr_sk         = ca_address_sk
 and     ca_gmt_offset           = -6 
 group by i_item_id),
 ws as (
 select i_item_id,sum(ws_ext_sales_price) total_sales
 from
        web_sales,
        date_dim,
         customer_address,
         item
 where
         item.i_item_id               in (select
  i.i_item_id
from item i
where i_color in ('purple','burlywood','indian'))
 and     ws_item_sk              = i_item_sk
 and     ws_sold_date_sk         = d_date_sk
 and     d_year                  = 2001
 and     d_moy                   = 1
 and     ws_bill_addr_sk         = ca_address_sk
 and     ca_gmt_offset           = -6
 group by i_item_id)
  select  i_item_id ,sum(total_sales) total_sales
 from  (select * from ss 
        union all
        select * from cs 
        union all
        select * from ws) tmp1
 group by i_item_id
 order by total_sales
 limit 100
{code}


Plan 
{code}
Warning: Map Join MAPJOIN[177][bigTable=?] in task 'Map 8' is a cross product
Warning: Map Join MAPJOIN[178][bigTable=?] in task 'Map 21' is a cross product
Warning: Map Join MAPJOIN[179][bigTable=web_sales] in task 'Map 14' is a cross product
Warning: Map Join MAPJOIN[180][bigTable=?] in task 'Map 20' is a cross product
Warning: Map Join MAPJOIN[181][bigTable=?] in task 'Map 22' is a cross product
Warning: Map Join MAPJOIN[182][bigTable=store_sales] in task 'Map 17' is a cross product
Warning: Map Join MAPJOIN[174][bigTable=?] in task 'Map 13' is a cross product
Warning: Map Join MAPJOIN[175][bigTable=?] in task 'Map 10' is a cross product
Warning: Map Join MAPJOIN[176][bigTable=catalog_sales] in task 'Map 2' is a cross product
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 10 <- Map 13 (BROADCAST_EDGE)
        Map 13 <- Map 16 (BROADCAST_EDGE)
        Map 14 <- Map 11 (BROADCAST_EDGE), Map 21 (BROADCAST_EDGE)
        Map 17 <- Map 12 (BROADCAST_EDGE), Map 22 (BROADCAST_EDGE)
        Map 2 <- Map 1 (BROADCAST_EDGE), Map 10 (BROADCAST_EDGE)
        Map 20 <- Map 9 (BROADCAST_EDGE)
        Map 21 <- Map 8 (BROADCAST_EDGE)
        Map 22 <- Map 20 (BROADCAST_EDGE)
        Map 8 <- Map 19 (BROADCAST_EDGE)
        Reducer 15 <- Map 14 (SIMPLE_EDGE), Union 4 (CONTAINS)
        Reducer 18 <- Map 17 (SIMPLE_EDGE), Union 4 (CONTAINS)
        Reducer 3 <- Map 2 (SIMPLE_EDGE), Union 4 (CONTAINS)
        Reducer 5 <- Union 4 (SIMPLE_EDGE)
        Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
        Reducer 7 <- Reducer 6 (SIMPLE_EDGE)
      DagName: mmokhtar_20140916234444_107f445a-d48f-4a42-89a1-8c53eaa8dec0:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: i
                  filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id
is not null) (type: boolean)
                  Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column
stats: NONE
                  Filter Operator
                    predicate: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id
is not null) (type: boolean)
                    Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE
Column stats: NONE
                    Select Operator
                      expressions: i_item_id (type: string)
                      outputColumnNames: _col0
                      Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE
Column stats: NONE
                      Group By Operator
                        keys: _col0 (type: string)
                        mode: hash
                        outputColumnNames: _col0
                        Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE
Column stats: NONE
                        Reduce Output Operator
                          key expressions: _col0 (type: string)
                          sort order: +
                          Map-reduce partition columns: _col0 (type: string)
                          Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE
Column stats: NONE
            Execution mode: vectorized
        Map 10 
            Map Operator Tree:
                TableScan
                  alias: customer_address
                  filterExpr: (ca_gmt_offset = UDFToFloat((- 6))) (type: boolean)
                  Statistics: Num rows: 800000 Data size: 811903688 Basic stats: COMPLETE
Column stats: NONE
                  Filter Operator
                    predicate: (ca_gmt_offset = UDFToFloat((- 6))) (type: boolean)
                    Statistics: Num rows: 400000 Data size: 405951844 Basic stats: COMPLETE
Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 {_col0} {_col31} {_col32}
                        1 {ca_address_sk}
                      keys:
                        0 
                        1 
                      outputColumnNames: _col0, _col31, _col32, _col56
                      input vertices:
                        0 Map 13
                      Statistics: Num rows: 440000 Data size: 446547040 Basic stats: COMPLETE
Column stats: NONE
                      Reduce Output Operator
                        sort order: 
                        Statistics: Num rows: 440000 Data size: 446547040 Basic stats: COMPLETE
Column stats: NONE
                        value expressions: _col0 (type: int), _col31 (type: int), _col32 (type:
string), _col56 (type: int)
            Execution mode: vectorized
        Map 11 
            Map Operator Tree:
                TableScan
                  alias: i
                  filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id
is not null) (type: boolean)
                  Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column
stats: NONE
                  Filter Operator
                    predicate: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id
is not null) (type: boolean)
                    Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE
Column stats: NONE
                    Select Operator
                      expressions: i_item_id (type: string)
                      outputColumnNames: _col0
                      Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE
Column stats: NONE
                      Group By Operator
                        keys: _col0 (type: string)
                        mode: hash
                        outputColumnNames: _col0
                        Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE
Column stats: NONE
                        Reduce Output Operator
                          key expressions: _col0 (type: string)
                          sort order: +
                          Map-reduce partition columns: _col0 (type: string)
                          Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE
Column stats: NONE
            Execution mode: vectorized
        Map 12 
            Map Operator Tree:
                TableScan
                  alias: i
                  filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id
is not null) (type: boolean)
                  Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column
stats: NONE
                  Filter Operator
                    predicate: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id
is not null) (type: boolean)
                    Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE
Column stats: NONE
                    Select Operator
                      expressions: i_item_id (type: string)
                      outputColumnNames: _col0
                      Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE
Column stats: NONE
                      Group By Operator
                        keys: _col0 (type: string)
                        mode: hash
                        outputColumnNames: _col0
                        Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE
Column stats: NONE
                        Reduce Output Operator
                          key expressions: _col0 (type: string)
                          sort order: +
                          Map-reduce partition columns: _col0 (type: string)
                          Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE
Column stats: NONE
            Execution mode: vectorized
        Map 13 
            Map Operator Tree:
                TableScan
                  alias: item
                  filterExpr: i_item_id is not null (type: boolean)
                  Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column
stats: NONE
                  Filter Operator
                    predicate: i_item_id is not null (type: boolean)
                    Statistics: Num rows: 24000 Data size: 34366356 Basic stats: COMPLETE
Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 {d_date_sk}
                        1 {i_item_sk} {i_item_id}
                      keys:
                        0 
                        1 
                      outputColumnNames: _col0, _col31, _col32
                      input vertices:
                        0 Map 16
                      Statistics: Num rows: 26400 Data size: 37802992 Basic stats: COMPLETE
Column stats: NONE
                      Reduce Output Operator
                        sort order: 
                        Statistics: Num rows: 26400 Data size: 37802992 Basic stats: COMPLETE
Column stats: NONE
                        value expressions: _col0 (type: int), _col31 (type: int), _col32 (type:
string)
            Execution mode: vectorized
        Map 14 
            Map Operator Tree:
                TableScan
                  alias: web_sales
                  Statistics: Num rows: 143966864 Data size: 19001610332 Basic stats: COMPLETE
Column stats: NONE
                  Map Join Operator
                    condition map:
                         Inner Join 0 to 1
                    condition expressions:
                      0 {_col0} {_col31} {_col32} {_col56}
                      1 {ws_item_sk} {ws_bill_addr_sk} {ws_ext_sales_price} {ws_sold_date_sk}
                    keys:
                      0 
                      1 
                    outputColumnNames: _col0, _col31, _col32, _col56, _col74, _col78, _col94,
_col105
                    input vertices:
                      0 Map 21
                    Statistics: Num rows: 158363552 Data size: 20901771264 Basic stats: COMPLETE
Column stats: NONE
                    Filter Operator
                      predicate: (((_col74 = _col31) and (_col105 = _col0)) and (_col78 =
_col56)) (type: boolean)
                      Statistics: Num rows: 19795444 Data size: 2612721408 Basic stats: COMPLETE
Column stats: NONE
                      Select Operator
                        expressions: _col94 (type: float), _col32 (type: string)
                        outputColumnNames: _col22, _col85
                        Statistics: Num rows: 19795444 Data size: 2612721408 Basic stats:
COMPLETE Column stats: NONE
                        Map Join Operator
                          condition map:
                               Left Semi Join 0 to 1
                          condition expressions:
                            0 {_col22} {_col85}
                            1 
                          keys:
                            0 _col85 (type: string)
                            1 _col0 (type: string)
                          outputColumnNames: _col22, _col85
                          input vertices:
                            1 Map 11
                          Statistics: Num rows: 21774988 Data size: 2873993728 Basic stats:
COMPLETE Column stats: NONE
                          Select Operator
                            expressions: _col85 (type: string), _col22 (type: float)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 21774988 Data size: 2873993728 Basic stats:
COMPLETE Column stats: NONE
                            Group By Operator
                              aggregations: sum(_col1)
                              keys: _col0 (type: string)
                              mode: hash
                              outputColumnNames: _col0, _col1
                              Statistics: Num rows: 21774988 Data size: 2873993728 Basic stats:
COMPLETE Column stats: NONE
                              Reduce Output Operator
                                key expressions: _col0 (type: string)
                                sort order: +
                                Map-reduce partition columns: _col0 (type: string)
                                Statistics: Num rows: 21774988 Data size: 2873993728 Basic
stats: COMPLETE Column stats: NONE
                                value expressions: _col1 (type: double)
            Execution mode: vectorized
        Map 16 
            Map Operator Tree:
                TableScan
                  alias: date_dim
                  filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column
stats: NONE
                  Filter Operator
                    predicate: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
                    Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE
Column stats: NONE
                    Reduce Output Operator
                      sort order: 
                      Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE
Column stats: NONE
                      value expressions: d_date_sk (type: int)
            Execution mode: vectorized
        Map 17 
            Map Operator Tree:
                TableScan
                  alias: store_sales
                  Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE
Column stats: NONE
                  Map Join Operator
                    condition map:
                         Inner Join 0 to 1
                    condition expressions:
                      0 {_col0} {_col31} {_col32} {_col56}
                      1 {ss_item_sk} {ss_addr_sk} {ss_ext_sales_price} {ss_sold_date_sk}
                    keys:
                      0 
                      1 
                    outputColumnNames: _col0, _col31, _col32, _col56, _col73, _col77, _col86,
_col94
                    input vertices:
                      0 Map 22
                    Statistics: Num rows: 605084224 Data size: 52107022336 Basic stats: COMPLETE
Column stats: NONE
                    Filter Operator
                      predicate: (((_col73 = _col31) and (_col94 = _col0)) and (_col77 = _col56))
(type: boolean)
                      Statistics: Num rows: 75635528 Data size: 6513377792 Basic stats: COMPLETE
Column stats: NONE
                      Select Operator
                        expressions: _col86 (type: float), _col32 (type: string)
                        outputColumnNames: _col14, _col74
                        Statistics: Num rows: 75635528 Data size: 6513377792 Basic stats:
COMPLETE Column stats: NONE
                        Map Join Operator
                          condition map:
                               Left Semi Join 0 to 1
                          condition expressions:
                            0 {_col14} {_col74}
                            1 
                          keys:
                            0 _col74 (type: string)
                            1 _col0 (type: string)
                          outputColumnNames: _col14, _col74
                          input vertices:
                            1 Map 12
                          Statistics: Num rows: 83199080 Data size: 7164715520 Basic stats:
COMPLETE Column stats: NONE
                          Select Operator
                            expressions: _col74 (type: string), _col14 (type: float)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 83199080 Data size: 7164715520 Basic stats:
COMPLETE Column stats: NONE
                            Group By Operator
                              aggregations: sum(_col1)
                              keys: _col0 (type: string)
                              mode: hash
                              outputColumnNames: _col0, _col1
                              Statistics: Num rows: 83199080 Data size: 7164715520 Basic stats:
COMPLETE Column stats: NONE
                              Reduce Output Operator
                                key expressions: _col0 (type: string)
                                sort order: +
                                Map-reduce partition columns: _col0 (type: string)
                                Statistics: Num rows: 83199080 Data size: 7164715520 Basic
stats: COMPLETE Column stats: NONE
                                value expressions: _col1 (type: double)
            Execution mode: vectorized
        Map 19 
            Map Operator Tree:
                TableScan
                  alias: date_dim
                  filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column
stats: NONE
                  Filter Operator
                    predicate: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
                    Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE
Column stats: NONE
                    Reduce Output Operator
                      sort order: 
                      Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE
Column stats: NONE
                      value expressions: d_date_sk (type: int)
            Execution mode: vectorized
        Map 2 
            Map Operator Tree:
                TableScan
                  alias: catalog_sales
                  Statistics: Num rows: 286549727 Data size: 37743959324 Basic stats: COMPLETE
Column stats: NONE
                  Map Join Operator
                    condition map:
                         Inner Join 0 to 1
                    condition expressions:
                      0 {_col0} {_col31} {_col32} {_col56}
                      1 {cs_bill_addr_sk} {cs_item_sk} {cs_ext_sales_price} {cs_sold_date_sk}
                    keys:
                      0 
                      1 
                    outputColumnNames: _col0, _col31, _col32, _col56, _col77, _col86, _col94,
_col105
                    input vertices:
                      0 Map 10
                    Statistics: Num rows: 315204704 Data size: 41518358528 Basic stats: COMPLETE
Column stats: NONE
                    Filter Operator
                      predicate: (((_col86 = _col31) and (_col105 = _col0)) and (_col77 =
_col56)) (type: boolean)
                      Statistics: Num rows: 39400588 Data size: 5189794816 Basic stats: COMPLETE
Column stats: NONE
                      Select Operator
                        expressions: _col94 (type: float), _col32 (type: string)
                        outputColumnNames: _col22, _col85
                        Statistics: Num rows: 39400588 Data size: 5189794816 Basic stats:
COMPLETE Column stats: NONE
                        Map Join Operator
                          condition map:
                               Left Semi Join 0 to 1
                          condition expressions:
                            0 {_col22} {_col85}
                            1 
                          keys:
                            0 _col85 (type: string)
                            1 _col0 (type: string)
                          outputColumnNames: _col22, _col85
                          input vertices:
                            1 Map 1
                          Statistics: Num rows: 43340648 Data size: 5708774400 Basic stats:
COMPLETE Column stats: NONE
                          Select Operator
                            expressions: _col85 (type: string), _col22 (type: float)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 43340648 Data size: 5708774400 Basic stats:
COMPLETE Column stats: NONE
                            Group By Operator
                              aggregations: sum(_col1)
                              keys: _col0 (type: string)
                              mode: hash
                              outputColumnNames: _col0, _col1
                              Statistics: Num rows: 43340648 Data size: 5708774400 Basic stats:
COMPLETE Column stats: NONE
                              Reduce Output Operator
                                key expressions: _col0 (type: string)
                                sort order: +
                                Map-reduce partition columns: _col0 (type: string)
                                Statistics: Num rows: 43340648 Data size: 5708774400 Basic
stats: COMPLETE Column stats: NONE
                                value expressions: _col1 (type: double)
            Execution mode: vectorized
        Map 20 
            Map Operator Tree:
                TableScan
                  alias: item
                  filterExpr: i_item_id is not null (type: boolean)
                  Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column
stats: NONE
                  Filter Operator
                    predicate: i_item_id is not null (type: boolean)
                    Statistics: Num rows: 24000 Data size: 34366356 Basic stats: COMPLETE
Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 {d_date_sk}
                        1 {i_item_sk} {i_item_id}
                      keys:
                        0 
                        1 
                      outputColumnNames: _col0, _col31, _col32
                      input vertices:
                        0 Map 9
                      Statistics: Num rows: 26400 Data size: 37802992 Basic stats: COMPLETE
Column stats: NONE
                      Reduce Output Operator
                        sort order: 
                        Statistics: Num rows: 26400 Data size: 37802992 Basic stats: COMPLETE
Column stats: NONE
                        value expressions: _col0 (type: int), _col31 (type: int), _col32 (type:
string)
            Execution mode: vectorized
        Map 21 
            Map Operator Tree:
                TableScan
                  alias: customer_address
                  filterExpr: (ca_gmt_offset = UDFToFloat((- 6))) (type: boolean)
                  Statistics: Num rows: 800000 Data size: 811903688 Basic stats: COMPLETE
Column stats: NONE
                  Filter Operator
                    predicate: (ca_gmt_offset = UDFToFloat((- 6))) (type: boolean)
                    Statistics: Num rows: 400000 Data size: 405951844 Basic stats: COMPLETE
Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 {_col0} {_col31} {_col32}
                        1 {ca_address_sk}
                      keys:
                        0 
                        1 
                      outputColumnNames: _col0, _col31, _col32, _col56
                      input vertices:
                        0 Map 8
                      Statistics: Num rows: 440000 Data size: 446547040 Basic stats: COMPLETE
Column stats: NONE
                      Reduce Output Operator
                        sort order: 
                        Statistics: Num rows: 440000 Data size: 446547040 Basic stats: COMPLETE
Column stats: NONE
                        value expressions: _col0 (type: int), _col31 (type: int), _col32 (type:
string), _col56 (type: int)
            Execution mode: vectorized
        Map 22 
            Map Operator Tree:
                TableScan
                  alias: customer_address
                  filterExpr: (ca_gmt_offset = UDFToFloat((- 6))) (type: boolean)
                  Statistics: Num rows: 800000 Data size: 811903688 Basic stats: COMPLETE
Column stats: NONE
                  Filter Operator
                    predicate: (ca_gmt_offset = UDFToFloat((- 6))) (type: boolean)
                    Statistics: Num rows: 400000 Data size: 405951844 Basic stats: COMPLETE
Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 {_col0} {_col31} {_col32}
                        1 {ca_address_sk}
                      keys:
                        0 
                        1 
                      outputColumnNames: _col0, _col31, _col32, _col56
                      input vertices:
                        0 Map 20
                      Statistics: Num rows: 440000 Data size: 446547040 Basic stats: COMPLETE
Column stats: NONE
                      Reduce Output Operator
                        sort order: 
                        Statistics: Num rows: 440000 Data size: 446547040 Basic stats: COMPLETE
Column stats: NONE
                        value expressions: _col0 (type: int), _col31 (type: int), _col32 (type:
string), _col56 (type: int)
            Execution mode: vectorized
        Map 8 
            Map Operator Tree:
                TableScan
                  alias: item
                  filterExpr: i_item_id is not null (type: boolean)
                  Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column
stats: NONE
                  Filter Operator
                    predicate: i_item_id is not null (type: boolean)
                    Statistics: Num rows: 24000 Data size: 34366356 Basic stats: COMPLETE
Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 {d_date_sk}
                        1 {i_item_sk} {i_item_id}
                      keys:
                        0 
                        1 
                      outputColumnNames: _col0, _col31, _col32
                      input vertices:
                        0 Map 19
                      Statistics: Num rows: 26400 Data size: 37802992 Basic stats: COMPLETE
Column stats: NONE
                      Reduce Output Operator
                        sort order: 
                        Statistics: Num rows: 26400 Data size: 37802992 Basic stats: COMPLETE
Column stats: NONE
                        value expressions: _col0 (type: int), _col31 (type: int), _col32 (type:
string)
            Execution mode: vectorized
        Map 9 
            Map Operator Tree:
                TableScan
                  alias: date_dim
                  filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column
stats: NONE
                  Filter Operator
                    predicate: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
                    Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE
Column stats: NONE
                    Reduce Output Operator
                      sort order: 
                      Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE
Column stats: NONE
                      value expressions: d_date_sk (type: int)
            Execution mode: vectorized
        Reducer 15 
            Reduce Operator Tree:
              Group By Operator
                aggregations: sum(VALUE._col0)
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Select Operator
                  expressions: _col0 (type: string), _col1 (type: double)
                  outputColumnNames: _col0, _col1
                  Select Operator
                    expressions: _col0 (type: string), _col1 (type: double)
                    outputColumnNames: _col0, _col1
                    Group By Operator
                      aggregations: sum(_col1)
                      keys: _col0 (type: string)
                      mode: hash
                      outputColumnNames: _col0, _col1
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        value expressions: _col1 (type: double)
        Reducer 18 
            Reduce Operator Tree:
              Group By Operator
                aggregations: sum(VALUE._col0)
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Select Operator
                  expressions: _col0 (type: string), _col1 (type: double)
                  outputColumnNames: _col0, _col1
                  Select Operator
                    expressions: _col0 (type: string), _col1 (type: double)
                    outputColumnNames: _col0, _col1
                    Select Operator
                      expressions: _col0 (type: string), _col1 (type: double)
                      outputColumnNames: _col0, _col1
                      Group By Operator
                        aggregations: sum(_col1)
                        keys: _col0 (type: string)
                        mode: hash
                        outputColumnNames: _col0, _col1
                        Reduce Output Operator
                          key expressions: _col0 (type: string)
                          sort order: +
                          Map-reduce partition columns: _col0 (type: string)
                          value expressions: _col1 (type: double)
        Reducer 3 
            Reduce Operator Tree:
              Group By Operator
                aggregations: sum(VALUE._col0)
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Select Operator
                  expressions: _col0 (type: string), _col1 (type: double)
                  outputColumnNames: _col0, _col1
                  Select Operator
                    expressions: _col0 (type: string), _col1 (type: double)
                    outputColumnNames: _col0, _col1
                    Select Operator
                      expressions: _col0 (type: string), _col1 (type: double)
                      outputColumnNames: _col0, _col1
                      Group By Operator
                        aggregations: sum(_col1)
                        keys: _col0 (type: string)
                        mode: hash
                        outputColumnNames: _col0, _col1
                        Reduce Output Operator
                          key expressions: _col0 (type: string)
                          sort order: +
                          Map-reduce partition columns: _col0 (type: string)
                          value expressions: _col1 (type: double)
        Reducer 5 
            Reduce Operator Tree:
              Group By Operator
                aggregations: sum(VALUE._col0)
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 37078679 Data size: 3936870912 Basic stats: COMPLETE
Column stats: NONE
                Select Operator
                  expressions: _col0 (type: string), _col1 (type: double)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 37078679 Data size: 3936870912 Basic stats: COMPLETE
Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col1 (type: double)
                    sort order: +
                    Statistics: Num rows: 37078679 Data size: 3936870912 Basic stats: COMPLETE
Column stats: NONE
                    TopN Hash Memory Usage: 0.04
                    value expressions: _col0 (type: string)
            Execution mode: vectorized
        Reducer 6 
            Reduce Operator Tree:
              Select Operator
                expressions: VALUE._col0 (type: string), KEY.reducesinkkey0 (type: double)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 37078679 Data size: 3936870912 Basic stats: COMPLETE
Column stats: NONE
                Limit
                  Number of rows: 100
                  Statistics: Num rows: 100 Data size: 10600 Basic stats: COMPLETE Column
stats: NONE
                  Reduce Output Operator
                    sort order: 
                    Statistics: Num rows: 100 Data size: 10600 Basic stats: COMPLETE Column
stats: NONE
                    TopN Hash Memory Usage: 0.04
                    value expressions: _col0 (type: string), _col1 (type: double)
            Execution mode: vectorized
        Reducer 7 
            Reduce Operator Tree:
              Select Operator
                expressions: VALUE._col0 (type: string), VALUE._col1 (type: double)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 100 Data size: 10600 Basic stats: COMPLETE Column stats:
NONE
                Limit
                  Number of rows: 100
                  Statistics: Num rows: 100 Data size: 10600 Basic stats: COMPLETE Column
stats: NONE
                  Select Operator
                    expressions: _col0 (type: string), _col1 (type: double)
                    outputColumnNames: _col0, _col1
                    Statistics: Num rows: 100 Data size: 10600 Basic stats: COMPLETE Column
stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 100 Data size: 10600 Basic stats: COMPLETE Column
stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
            Execution mode: vectorized
        Union 4 
            Vertex: Union 4

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

{code}

> With CBO enabled cross product is generated when a subquery is present
> ----------------------------------------------------------------------
>
>                 Key: HIVE-7985
>                 URL: https://issues.apache.org/jira/browse/HIVE-7985
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 0.14.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Laljo John Pullokkaran
>             Fix For: 0.14.0
>
>
> This is a regression introduced in the latest build of the CBO branch.
> Removing the subquery for item will remove the cross products
> Query
> {code}
> select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, date_dim, item
where item.i_item_id in (select i.i_item_id from item i where i_color in ('purple','burlywood','indian'))
and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1
group by i_item_id;
> {code}
> {code}
> Warning: Map Join MAPJOIN[38][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[39][bigTable=store_sales] in task 'Map 4' is a cross product
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Map 1 <- Map 3 (BROADCAST_EDGE)
>         Map 4 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
>         Reducer 5 <- Map 4 (SIMPLE_EDGE)
>       DagName: mmokhtar_20140904141313_9c253f7e-aad1-4ca4-9be1-ea45e3d34496:1
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: item
>                   filterExpr: (true and i_item_id is not null) (type: boolean)
>                   Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE
Column stats: NONE
>                   Filter Operator
>                     predicate: i_item_id is not null (type: boolean)
>                     Statistics: Num rows: 231000 Data size: 331931080 Basic stats: COMPLETE
Column stats: NONE
>                     Map Join Operator
>                       condition map:
>                            Inner Join 0 to 1
>                       condition expressions:
>                         0 {i_item_sk} {i_item_id}
>                         1 {d_date_sk}
>                       keys:
>                         0
>                         1
>                       outputColumnNames: _col0, _col1, _col25
>                       input vertices:
>                         1 Map 3
>                       Statistics: Num rows: 254100 Data size: 365124192 Basic stats:
COMPLETE Column stats: NONE
>                       Select Operator
>                         expressions: _col0 (type: int), _col1 (type: string), _col25
(type: int)
>                         outputColumnNames: _col0, _col1, _col25
>                         Statistics: Num rows: 254100 Data size: 365124192 Basic stats:
COMPLETE Column stats: NONE
>                         Reduce Output Operator
>                           sort order:
>                           Statistics: Num rows: 254100 Data size: 365124192 Basic stats:
COMPLETE Column stats: NONE
>                           value expressions: _col0 (type: int), _col1 (type: string),
_col25 (type: int)
>             Execution mode: vectorized
>         Map 2
>             Map Operator Tree:
>                 TableScan
>                   alias: i
>                   filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id
is not null) (type: boolean)
>                   Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE
Column stats: NONE
>                   Filter Operator
>                     predicate: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id
is not null) (type: boolean)
>                     Statistics: Num rows: 115500 Data size: 165965540 Basic stats: COMPLETE
Column stats: NONE
>                     Select Operator
>                       expressions: i_item_id (type: string)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 115500 Data size: 165965540 Basic stats:
COMPLETE Column stats: NONE
>                       Group By Operator
>                         keys: _col0 (type: string)
>                         mode: hash
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 115500 Data size: 165965540 Basic stats:
COMPLETE Column stats: NONE
>                         Reduce Output Operator
>                           key expressions: _col0 (type: string)
>                           sort order: +
>                           Map-reduce partition columns: _col0 (type: string)
>                           Statistics: Num rows: 115500 Data size: 165965540 Basic stats:
COMPLETE Column stats: NONE
>             Execution mode: vectorized
>         Map 3
>             Map Operator Tree:
>                 TableScan
>                   alias: date_dim
>                   filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
>                   Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE
Column stats: NONE
>                   Filter Operator
>                     predicate: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
>                     Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE
Column stats: NONE
>                     Reduce Output Operator
>                       sort order:
>                       Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE
Column stats: NONE
>                       value expressions: d_date_sk (type: int)
>             Execution mode: vectorized
>         Map 4
>             Map Operator Tree:
>                 TableScan
>                   alias: store_sales
>                   Statistics: Num rows: 82510879939 Data size: 7203833257964 Basic stats:
COMPLETE Column stats: NONE
>                   Map Join Operator
>                     condition map:
>                          Inner Join 0 to 1
>                     condition expressions:
>                       0 {ss_sold_date_sk} {ss_item_sk} {ss_ext_sales_price}
>                       1 {_col0} {_col1} {_col25}
>                     keys:
>                       0
>                       1
>                     outputColumnNames: _col0, _col2, _col15, _col27, _col28, _col52
>                     input vertices:
>                       1 Map 1
>                     Statistics: Num rows: 90761969664 Data size: 7924217282560 Basic
stats: COMPLETE Column stats: NONE
>                     Filter Operator
>                       predicate: ((_col2 = _col27) and (_col0 = _col52)) (type: boolean)
>                       Statistics: Num rows: 22690492416 Data size: 1981054320640 Basic
stats: COMPLETE Column stats: NONE
>                       Select Operator
>                         expressions: _col15 (type: float), _col28 (type: string)
>                         outputColumnNames: _col15, _col59
>                         Statistics: Num rows: 22690492416 Data size: 1981054320640 Basic
stats: COMPLETE Column stats: NONE
>                         Map Join Operator
>                           condition map:
>                                Left Semi Join 0 to 1
>                           condition expressions:
>                             0 {_col15} {_col59}
>                             1
>                           keys:
>                             0 _col59 (type: string)
>                             1 _col0 (type: string)
>                           outputColumnNames: _col15, _col59
>                           input vertices:
>                             1 Map 2
>                           Statistics: Num rows: 24959541248 Data size: 2179159818240
Basic stats: COMPLETE Column stats: NONE
>                           Select Operator
>                             expressions: _col59 (type: string), _col15 (type: float)
>                             outputColumnNames: _col0, _col1
>                             Statistics: Num rows: 24959541248 Data size: 2179159818240
Basic stats: COMPLETE Column stats: NONE
>                             Group By Operator
>                               aggregations: sum(_col1)
>                               keys: _col0 (type: string)
>                               mode: hash
>                               outputColumnNames: _col0, _col1
>                               Statistics: Num rows: 24959541248 Data size: 2179159818240
Basic stats: COMPLETE Column stats: NONE
>                               Reduce Output Operator
>                                 key expressions: _col0 (type: string)
>                                 sort order: +
>                                 Map-reduce partition columns: _col0 (type: string)
>                                 Statistics: Num rows: 24959541248 Data size: 2179159818240
Basic stats: COMPLETE Column stats: NONE
>                                 value expressions: _col1 (type: double)
>             Execution mode: vectorized
>         Reducer 5
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: sum(VALUE._col0)
>                 keys: KEY._col0 (type: string)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1
>                 Statistics: Num rows: 12479770624 Data size: 1089579909120 Basic stats:
COMPLETE Column stats: NONE
>                 Select Operator
>                   expressions: _col0 (type: string), _col1 (type: double)
>                   outputColumnNames: _col0, _col1
>                   Statistics: Num rows: 12479770624 Data size: 1089579909120 Basic stats:
COMPLETE Column stats: NONE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 12479770624 Data size: 1089579909120 Basic
stats: COMPLETE Column stats: NONE
>                     table:
>                         input format: org.apache.hadoop.mapred.TextInputFormat
>                         output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> No cross product generated for this query
> {code}
>  select i_item_id,sum(ss_ext_sales_price) total_sales
>  from
>         store_sales,
>         date_dim,
>          item
> where i_color in ('purple','burlywood','indian')
>  and     ss_item_sk              = i_item_sk
>  and     ss_sold_date_sk         = d_date_sk
>  and     d_year                  = 2001
>  and     d_moy                   = 1
>  group by i_item_id;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message