hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11110) Enable HiveJoinAddNotNullRule in CBO
Date Mon, 06 Jul 2015 21:56:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615740#comment-14615740
] 

Mostafa Mokhtar commented on HIVE-11110:
----------------------------------------

[~jpullokkaran]

this is the full query 
{code}
select  i_item_id
       ,i_item_desc
       ,s_state
       ,count(ss_quantity) as store_sales_quantitycount
       ,avg(ss_quantity) as store_sales_quantityave
       ,stddev_samp(ss_quantity) as store_sales_quantitystdev
       ,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov
       ,count(sr_return_quantity) as_store_returns_quantitycount
       ,avg(sr_return_quantity) as_store_returns_quantityave
       ,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev
       ,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as store_returns_quantitycov
       ,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) as catalog_sales_quantityave
       ,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitystdev
       ,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov
 from store_sales
     ,store_returns
     ,catalog_sales
     ,date_dim d1
     ,date_dim d2
     ,date_dim d3
     ,store
     ,item
 where d1.d_quarter_name = '2000Q1'
   and d1.d_date_sk = store_sales.ss_sold_date_sk
   and item.i_item_sk = store_sales.ss_item_sk
   and store.s_store_sk = store_sales.ss_store_sk
   and store_sales.ss_customer_sk = store_returns.sr_customer_sk
   and store_sales.ss_item_sk = store_returns.sr_item_sk
   and store_sales.ss_ticket_number = store_returns.sr_ticket_number
   and store_returns.sr_returned_date_sk = d2.d_date_sk
   and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
   and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk
   and store_returns.sr_item_sk = catalog_sales.cs_item_sk
   and catalog_sales.cs_sold_date_sk = d3.d_date_sk
   and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
 group by i_item_id
         ,i_item_desc
         ,s_state
 order by i_item_id
         ,i_item_desc
         ,s_state
limit 100;
{code}

Expected plan 
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 10 <- Map 11 (BROADCAST_EDGE)
        Map 3 <- Map 7 (BROADCAST_EDGE)
        Map 8 <- Map 10 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)
        Reducer 4 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE), Map 3 (SIMPLE_EDGE),
Map 8 (SIMPLE_EDGE)
        Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
        Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
      DagName: jenkins_20150706174402_eceec100-6023-4058-85de-5cc96c9a150e:2
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: item
                  filterExpr: i_item_sk is not null (type: boolean)
                  Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column
stats: COMPLETE
                  Filter Operator
                    predicate: i_item_sk is not null (type: boolean)
                    Statistics: Num rows: 48000 Data size: 13824000 Basic stats: COMPLETE
Column stats: COMPLETE
                    Select Operator
                      expressions: i_item_sk (type: int), i_item_id (type: string), i_item_desc
(type: string)
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 48000 Data size: 13824000 Basic stats: COMPLETE
Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 48000 Data size: 13824000 Basic stats: COMPLETE
Column stats: COMPLETE
                        value expressions: _col1 (type: string), _col2 (type: string)
            Execution mode: vectorized
        Map 10
            Map Operator Tree:
                TableScan
                  alias: store_returns
                  filterExpr: ((sr_customer_sk is not null and sr_item_sk is not null) and
sr_ticket_number is not null) (type: boolean)
                  Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE
Column stats: COMPLETE
                  Filter Operator
                    predicate: ((sr_customer_sk is not null and sr_item_sk is not null) and
sr_ticket_number is not null) (type: boolean)
                    Statistics: Num rows: 54568434 Data size: 1083441396 Basic stats: COMPLETE
Column stats: COMPLETE
                    Select Operator
                      expressions: sr_item_sk (type: int), sr_customer_sk (type: int), sr_ticket_number
(type: int), sr_return_quantity (type: int), sr_returned_date_sk (type: int)
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4
                      Statistics: Num rows: 54568434 Data size: 1083441396 Basic stats: COMPLETE
Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col4 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0, _col1, _col2, _col3
                        input vertices:
                          1 Map 11
                        Statistics: Num rows: 27283843 Data size: 436541488 Basic stats: COMPLETE
Column stats: COMPLETE
                        HybridGraceHashJoin: true
                        Reduce Output Operator
                          key expressions: _col1 (type: int), _col0 (type: int), _col2 (type:
int)
                          sort order: +++
                          Map-reduce partition columns: _col1 (type: int), _col0 (type: int),
_col2 (type: int)
                          Statistics: Num rows: 27283843 Data size: 436541488 Basic stats:
COMPLETE Column stats: COMPLETE
                          value expressions: _col3 (type: int)
            Execution mode: vectorized
        Map 11
            Map Operator Tree:
                TableScan
                  alias: d1
                  filterExpr: ((d_quarter_name) IN ('2000Q1', '2000Q2', '2000Q3') and d_date_sk
is not null) (type: boolean)
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column
stats: COMPLETE
                  Filter Operator
                    predicate: ((d_quarter_name) IN ('2000Q1', '2000Q2', '2000Q3') and d_date_sk
is not null) (type: boolean)
                    Statistics: Num rows: 36524 Data size: 3433256 Basic stats: COMPLETE Column
stats: COMPLETE
                    Select Operator
                      expressions: d_date_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE
Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE
Column stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: int)
                        outputColumnNames: _col0
                        Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE
Column stats: COMPLETE
                        Group By Operator
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE
Column stats: COMPLETE
                          Dynamic Partitioning Event Operator
                            Target Input: store_returns
                            Partition key expr: sr_returned_date_sk
                            Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE
Column stats: COMPLETE
                            Target column: sr_returned_date_sk
                            Target Vertex: Map 10
            Execution mode: vectorized
        Map 2
            Map Operator Tree:
                TableScan
                  alias: store
                  filterExpr: s_store_sk is not null (type: boolean)
                  Statistics: Num rows: 212 Data size: 405680 Basic stats: COMPLETE Column
stats: COMPLETE
                  Filter Operator
                    predicate: s_store_sk is not null (type: boolean)
                    Statistics: Num rows: 212 Data size: 19080 Basic stats: COMPLETE Column
stats: COMPLETE
                    Select Operator
                      expressions: s_store_sk (type: int), s_state (type: string)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 212 Data size: 19080 Basic stats: COMPLETE Column
stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 212 Data size: 19080 Basic stats: COMPLETE Column
stats: COMPLETE
                        value expressions: _col1 (type: string)
            Execution mode: vectorized
        Map 3
            Map Operator Tree:
                TableScan
                  alias: catalog_sales
                  filterExpr: (cs_bill_customer_sk is not null and cs_item_sk is not null)
(type: boolean)
                  Statistics: Num rows: 286549727 Data size: 37743959324 Basic stats: COMPLETE
Column stats: COMPLETE
                  Filter Operator
                    predicate: (cs_bill_customer_sk is not null and cs_item_sk is not null)
(type: boolean)
                    Statistics: Num rows: 285829806 Data size: 4567531212 Basic stats: COMPLETE
Column stats: COMPLETE
                    Select Operator
                      expressions: cs_bill_customer_sk (type: int), cs_item_sk (type: int),
cs_quantity (type: int), cs_sold_date_sk (type: int)
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 285829806 Data size: 4567531212 Basic stats: COMPLETE
Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col3 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0, _col1, _col2
                        input vertices:
                          1 Map 7
                        Statistics: Num rows: 142912944 Data size: 1714955328 Basic stats:
COMPLETE Column stats: COMPLETE
                        HybridGraceHashJoin: true
                        Reduce Output Operator
                          key expressions: _col0 (type: int), _col1 (type: int)
                          sort order: ++
                          Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
                          Statistics: Num rows: 142912944 Data size: 1714955328 Basic stats:
COMPLETE Column stats: COMPLETE
                          value expressions: _col2 (type: int)
            Execution mode: vectorized
        Map 7
            Map Operator Tree:
                TableScan
                  alias: d1
                  filterExpr: ((d_quarter_name) IN ('2000Q1', '2000Q2', '2000Q3') and d_date_sk
is not null) (type: boolean)
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column
stats: COMPLETE
                  Filter Operator
                    predicate: ((d_quarter_name) IN ('2000Q1', '2000Q2', '2000Q3') and d_date_sk
is not null) (type: boolean)
                    Statistics: Num rows: 36524 Data size: 3433256 Basic stats: COMPLETE Column
stats: COMPLETE
                    Select Operator
                      expressions: d_date_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE
Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE
Column stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: int)
                        outputColumnNames: _col0
                        Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE
Column stats: COMPLETE
                        Group By Operator
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE
Column stats: COMPLETE
                          Dynamic Partitioning Event Operator
                            Target Input: catalog_sales
                            Partition key expr: cs_sold_date_sk
                            Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE
Column stats: COMPLETE
                            Target column: cs_sold_date_sk
                            Target Vertex: Map 3
            Execution mode: vectorized
        Map 8
            Map Operator Tree:
                TableScan
                  alias: store_sales
                  filterExpr: (((ss_customer_sk is not null and ss_item_sk is not null) and
ss_ticket_number is not null) and ss_store_sk is not null) (type: boolean)
                  Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE
Column stats: COMPLETE
                  Filter Operator
                    predicate: (((ss_customer_sk is not null and ss_item_sk is not null) and
ss_ticket_number is not null) and ss_store_sk is not null) (type: boolean)
                    Statistics: Num rows: 524465217 Data size: 12438904520 Basic stats: COMPLETE
Column stats: COMPLETE
                    Select Operator
                      expressions: ss_item_sk (type: int), ss_customer_sk (type: int), ss_store_sk
(type: int), ss_ticket_number (type: int), ss_quantity (type: int), ss_sold_date_sk (type:
int)
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                      Statistics: Num rows: 524465217 Data size: 12438904520 Basic stats:
COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col5 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0, _col1, _col2, _col3, _col4
                        input vertices:
                          1 Map 9
                        Statistics: Num rows: 29008905 Data size: 580178100 Basic stats: COMPLETE
Column stats: COMPLETE
                        HybridGraceHashJoin: true
                        Map Join Operator
                          condition map:
                               Inner Join 0 to 1
                          keys:
                            0 _col1 (type: int), _col0 (type: int), _col3 (type: int)
                            1 _col1 (type: int), _col0 (type: int), _col2 (type: int)
                          outputColumnNames: _col0, _col2, _col4, _col8, _col9, _col11
                          input vertices:
                            1 Map 10
                          Statistics: Num rows: 27283843 Data size: 654812232 Basic stats:
COMPLETE Column stats: COMPLETE
                          HybridGraceHashJoin: true
                          Select Operator
                            expressions: _col0 (type: int), _col11 (type: int), _col2 (type:
int), _col4 (type: int), _col8 (type: int), _col9 (type: int)
                            outputColumnNames: _col0, _col11, _col2, _col4, _col8, _col9
                            Statistics: Num rows: 27283843 Data size: 654812232 Basic stats:
COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col9 (type: int), _col8 (type: int)
                              sort order: ++
                              Map-reduce partition columns: _col9 (type: int), _col8 (type:
int)
                              Statistics: Num rows: 27283843 Data size: 654812232 Basic stats:
COMPLETE Column stats: COMPLETE
                              value expressions: _col0 (type: int), _col2 (type: int), _col4
(type: int), _col11 (type: int)
            Execution mode: vectorized
        Map 9
            Map Operator Tree:
                TableScan
                  alias: d1
                  filterExpr: ((d_quarter_name = '2000Q1') and d_date_sk is not null) (type:
boolean)
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column
stats: COMPLETE
                  Filter Operator
                    predicate: ((d_quarter_name = '2000Q1') and d_date_sk is not null) (type:
boolean)
                    Statistics: Num rows: 101 Data size: 9494 Basic stats: COMPLETE Column
stats: COMPLETE
                    Select Operator
                      expressions: d_date_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 101 Data size: 404 Basic stats: COMPLETE Column
stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 101 Data size: 404 Basic stats: COMPLETE Column
stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: int)
                        outputColumnNames: _col0
                        Statistics: Num rows: 101 Data size: 404 Basic stats: COMPLETE Column
stats: COMPLETE
                        Group By Operator
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 50 Data size: 200 Basic stats: COMPLETE Column
stats: COMPLETE
                          Dynamic Partitioning Event Operator
                            Target Input: store_sales
                            Partition key expr: ss_sold_date_sk
                            Statistics: Num rows: 50 Data size: 200 Basic stats: COMPLETE
Column stats: COMPLETE
                            Target column: ss_sold_date_sk
                            Target Vertex: Map 8
            Execution mode: vectorized
        Reducer 4
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 _col0 (type: int), _col1 (type: int)
                  1 _col9 (type: int), _col8 (type: int)
                outputColumnNames: _col2, _col6, _col8, _col10, _col17
                Statistics: Num rows: 27283843 Data size: 545676860 Basic stats: COMPLETE
Column stats: COMPLETE
                Select Operator
                  expressions: _col10 (type: int), _col17 (type: int), _col2 (type: int),
_col6 (type: int), _col8 (type: int)
                  outputColumnNames: _col10, _col17, _col2, _col6, _col8
                  Statistics: Num rows: 27283843 Data size: 545676860 Basic stats: COMPLETE
Column stats: COMPLETE
                  Map Join Operator
                    condition map:
                         Inner Join 0 to 1
                    keys:
                      0 _col0 (type: int)
                      1 _col8 (type: int)
                    outputColumnNames: _col1, _col4, _col8, _col12, _col19
                    input vertices:
                      0 Map 2
                    Statistics: Num rows: 27283843 Data size: 2782951986 Basic stats: COMPLETE
Column stats: COMPLETE
                    HybridGraceHashJoin: true
                    Select Operator
                      expressions: _col1 (type: string), _col12 (type: int), _col19 (type:
int), _col4 (type: int), _col8 (type: int)
                      outputColumnNames: _col1, _col12, _col19, _col4, _col8
                      Statistics: Num rows: 27283843 Data size: 2782951986 Basic stats: COMPLETE
Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col0 (type: int)
                          1 _col8 (type: int)
                        outputColumnNames: _col1, _col2, _col4, _col7, _col15, _col22
                        input vertices:
                          0 Map 1
                        Statistics: Num rows: 27283843 Data size: 10422428026 Basic stats:
COMPLETE Column stats: COMPLETE
                        HybridGraceHashJoin: true
                        Select Operator
                          expressions: _col1 (type: string), _col2 (type: string), _col4 (type:
string), _col15 (type: int), _col22 (type: int), _col7 (type: int)
                          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                          Statistics: Num rows: 27283843 Data size: 10422428026 Basic stats:
COMPLETE Column stats: COMPLETE
                          Group By Operator
                            aggregations: count(_col3), avg(_col3), stddev_samp(_col3), count(_col4),
avg(_col4), stddev_samp(_col4), count(_col5), avg(_col5), stddev_samp(_col5)
                            keys: _col0 (type: string), _col1 (type: string), _col2 (type:
string)
                            mode: hash
                            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6,
_col7, _col8, _col9, _col10, _col11
                            Statistics: Num rows: 27283843 Data size: 10749834142 Basic stats:
COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: string), _col1 (type: string),
_col2 (type: string)
                              sort order: +++
                              Map-reduce partition columns: _col0 (type: string), _col1 (type:
string), _col2 (type: string)
                              Statistics: Num rows: 27283843 Data size: 10749834142 Basic
stats: COMPLETE Column stats: COMPLETE
                              value expressions: _col3 (type: bigint), _col4 (type: struct<count:bigint,sum:double,input:int>),
_col5 (type: struct<count:bigint,sum:double,variance:double>), _col6 (type: bigint),
_col7 (type: struct<count:bigint,sum:double,input:int>), _col8 (type: struct<count:bigint,sum:double,variance:double>),
_col9 (type: bigint), _col10 (type: struct<count:bigint,sum:double,input:int>), _col11
(type: struct<count:bigint,sum:double,variance:double>)
        Reducer 5
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0), avg(VALUE._col1), stddev_samp(VALUE._col2),
count(VALUE._col3), avg(VALUE._col4), stddev_samp(VALUE._col5), count(VALUE._col6), avg(VALUE._col7),
stddev_samp(VALUE._col8)
                keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type:
string)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7,
_col8, _col9, _col10, _col11
                Statistics: Num rows: 27283843 Data size: 12059458606 Basic stats: COMPLETE
Column stats: COMPLETE
                Select Operator
                  expressions: _col0 (type: string), _col1 (type: string), (_col8 / _col7)
(type: double), _col9 (type: bigint), _col10 (type: double), (_col11 / _col10) (type: double),
_col2 (type: string), _col3 (type: bigint), _col4 (type: double), _col5 (type: double), (_col5
/ _col4) (type: double), _col6 (type: bigint), _col7 (type: double), _col8 (type: double)
                  outputColumnNames: _col0, _col1, _col10, _col11, _col12, _col13, _col2,
_col3, _col4, _col5, _col6, _col7, _col8, _col9
                  Statistics: Num rows: 27283843 Data size: 12496000094 Basic stats: COMPLETE
Column stats: COMPLETE
                  Reduce Output Operator
                    key expressions: _col0 (type: string), _col1 (type: string), _col2 (type:
string)
                    sort order: +++
                    Statistics: Num rows: 27283843 Data size: 12496000094 Basic stats: COMPLETE
Column stats: COMPLETE
                    TopN Hash Memory Usage: 0.04
                    value expressions: _col3 (type: bigint), _col4 (type: double), _col5 (type:
double), _col6 (type: double), _col7 (type: bigint), _col8 (type: double), _col9 (type: double),
_col10 (type: double), _col11 (type: bigint), _col12 (type: double), _col13 (type: double)
        Reducer 6
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1 (type:
string), KEY.reducesinkkey2 (type: string), VALUE._col0 (type: bigint), VALUE._col1 (type:
double), VALUE._col2 (type: double), VALUE._col3 (type: double), VALUE._col4 (type: bigint),
VALUE._col5 (type: double), VALUE._col6 (type: double), VALUE._col7 (type: double), VALUE._col8
(type: bigint), VALUE._col9 (type: double), VALUE._col10 (type: double), VALUE._col10 (type:
double)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7,
_col8, _col9, _col10, _col11, _col12, _col13, _col14
                Statistics: Num rows: 27283843 Data size: 12714270838 Basic stats: COMPLETE
Column stats: COMPLETE
                Limit
                  Number of rows: 100
                  Statistics: Num rows: 100 Data size: 46600 Basic stats: COMPLETE Column
stats: COMPLETE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 100 Data size: 46600 Basic stats: COMPLETE Column
stats: COMPLETE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: 100
      Processor Tree:
        ListSink
{code}


> Enable HiveJoinAddNotNullRule in CBO
> ------------------------------------
>
>                 Key: HIVE-11110
>                 URL: https://issues.apache.org/jira/browse/HIVE-11110
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-11110.1.patch, HIVE-11110.2.patch, HIVE-11110.4.patch, HIVE-11110.5.patch,
HIVE-11110.patch
>
>
> Query
> {code}
> select  count(*)
>  from store_sales
>      ,store_returns
>      ,date_dim d1
>      ,date_dim d2
>  where d1.d_quarter_name = '2000Q1'
>    and d1.d_date_sk = ss_sold_date_sk
>    and ss_customer_sk = sr_customer_sk
>    and ss_item_sk = sr_item_sk
>    and ss_ticket_number = sr_ticket_number
>    and sr_returned_date_sk = d2.d_date_sk
>    and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3’);
> {code}
> The store_sales table is partitioned on ss_sold_date_sk, which is also used in a join
clause. The join clause should add a filter “filterExpr: ss_sold_date_sk is not null”,
which should get pushed the MetaStore when fetching the stats. Currently this is not done
in CBO planning, which results in the stats from __HIVE_DEFAULT_PARTITION__ to be fetched
and considered in the optimization phase. In particular, this increases the NDV for the join
columns and may result in wrong planning.
> Including HiveJoinAddNotNullRule in the optimization phase solves this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message