hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-9695) Redundant filter operator in reducer Vertex when CBO is disabled
Date Thu, 08 Oct 2015 12:36:26 GMT

     [ https://issues.apache.org/jira/browse/HIVE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jesus Camacho Rodriguez updated HIVE-9695:
------------------------------------------
    Affects Version/s:     (was: 1.1.0)
                           (was: 1.0.0)
                           (was: 0.14.0)
                       2.0.0

> Redundant filter operator in reducer Vertex when CBO is disabled
> ----------------------------------------------------------------
>
>                 Key: HIVE-9695
>                 URL: https://issues.apache.org/jira/browse/HIVE-9695
>             Project: Hive
>          Issue Type: Improvement
>          Components: Logical Optimizer
>    Affects Versions: 2.0.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Jesus Camacho Rodriguez
>             Fix For: 2.0.0
>
>         Attachments: HIVE-9695.01.patch, HIVE-9695.01.patch, HIVE-9695.patch
>
>
> There is a redundant filter operator in reducer Vertex when CBO is disabled.
> Query 
> {code}
> select 
>         ss_item_sk, ss_ticket_number, ss_store_sk
>     from
>         store_sales a, store_returns b, store
>     where
>         a.ss_item_sk = b.sr_item_sk
>             and a.ss_ticket_number = b.sr_ticket_number 
>             and ss_sold_date_sk between 2450816 and 2451500
> 			and sr_returned_date_sk between 2450816 and 2451500
> 			and s_store_sk = ss_store_sk;
> {code}
> Plan snippet 
> {code}
>   Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE Column stats:
COMPLETE
>                   Filter Operator
>                     predicate: (((((_col1 = _col27) and (_col8 = _col34)) and _col22
BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) and (_col49 = _col6))
(type: boolean)
> {code}
> Full plan with CBO disabled
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 (SIMPLE_EDGE)
>       DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: b
>                   filterExpr: ((sr_item_sk is not null and sr_ticket_number is not null)
and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: boolean)
>                   Statistics: Num rows: 2370038095 Data size: 170506118656 Basic stats:
COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (sr_item_sk is not null and sr_ticket_number is not null)
(type: boolean)
>                     Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats:
COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: sr_item_sk (type: int), sr_ticket_number (type:
int)
>                       sort order: ++
>                       Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number
(type: int)
>                       Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats:
COMPLETE Column stats: COMPLETE
>                       value expressions: sr_returned_date_sk (type: int)
>             Execution mode: vectorized
>         Map 3
>             Map Operator Tree:
>                 TableScan
>                   alias: store
>                   filterExpr: s_store_sk is not null (type: boolean)
>                   Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE
Column stats: COMPLETE
>                   Filter Operator
>                     predicate: s_store_sk is not null (type: boolean)
>                     Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE
Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: s_store_sk (type: int)
>                       sort order: +
>                       Map-reduce partition columns: s_store_sk (type: int)
>                       Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE
Column stats: COMPLETE
>             Execution mode: vectorized
>         Map 4
>             Map Operator Tree:
>                 TableScan
>                   alias: a
>                   filterExpr: (((ss_item_sk is not null and ss_ticket_number is not null)
and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 AND 2451500) (type: boolean)
>                   Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats:
COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((ss_item_sk is not null and ss_ticket_number is not null)
and ss_store_sk is not null) (type: boolean)
>                     Statistics: Num rows: 8405840828 Data size: 110101408700 Basic stats:
COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: ss_item_sk (type: int), ss_ticket_number (type:
int)
>                       sort order: ++
>                       Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number
(type: int)
>                       Statistics: Num rows: 8405840828 Data size: 110101408700 Basic
stats: COMPLETE Column stats: COMPLETE
>                       value expressions: ss_store_sk (type: int), ss_sold_date_sk (type:
int)
>             Execution mode: vectorized
>         Reducer 2
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 {KEY.reducesinkkey0} {VALUE._col5} {KEY.reducesinkkey1} {VALUE._col20}
>                   1 {KEY.reducesinkkey0} {KEY.reducesinkkey1} {VALUE._col17}
>                 outputColumnNames: _col1, _col6, _col8, _col22, _col27, _col34, _col45
>                 Statistics: Num rows: 57439343 Data size: 1148786860 Basic stats: COMPLETE
Column stats: COMPLETE
>                 Map Join Operator
>                   condition map:
>                        Inner Join 0 to 1
>                   condition expressions:
>                     0 {_col1} {_col6} {_col8} {_col22} {_col27} {_col34} {_col45}
>                     1 {s_store_sk}
>                   keys:
>                     0 _col6 (type: int)
>                     1 s_store_sk (type: int)
>                   outputColumnNames: _col1, _col6, _col8, _col22, _col27, _col34, _col45,
_col49
>                   input vertices:
>                     1 Map 3
>                   Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE
Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (((((_col1 = _col27) and (_col8 = _col34)) and _col22
BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) and (_col49 = _col6))
(type: boolean)
>                     Statistics: Num rows: 1794979 Data size: 57439328 Basic stats: COMPLETE
Column stats: COMPLETE
>                     Select Operator
>                       expressions: _col1 (type: int), _col8 (type: int), _col6 (type:
int)
>                       outputColumnNames: _col0, _col1, _col2
>                       Statistics: Num rows: 1794979 Data size: 21539748 Basic stats:
COMPLETE Column stats: COMPLETE
>                       File Output Operator
>                         compressed: false
>                         Statistics: Num rows: 1794979 Data size: 21539748 Basic stats:
COMPLETE Column stats: COMPLETE
>                         table:
>                             input format: org.apache.hadoop.mapred.TextInputFormat
>                             output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                             serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> Full plan with CBO enabled
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Map 4 <- Map 1 (BROADCAST_EDGE)
>         Reducer 3 <- Map 2 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
>       DagName: mmokhtar_20150214182525_63a9838f-db9f-40e9-8ae1-77c77143dccf:12
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: store
>                   filterExpr: s_store_sk is not null (type: boolean)
>                   Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE
Column stats: COMPLETE
>                   Filter Operator
>                     predicate: s_store_sk is not null (type: boolean)
>                     Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE
Column stats: COMPLETE
>                     Select Operator
>                       expressions: s_store_sk (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE
Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE
Column stats: COMPLETE
>             Execution mode: vectorized
>         Map 2
>             Map Operator Tree:
>                 TableScan
>                   alias: b
>                   filterExpr: (sr_item_sk is not null and sr_ticket_number is not null)
(type: boolean)
>                   Statistics: Num rows: 2370038095 Data size: 170506118656 Basic stats:
COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (sr_item_sk is not null and sr_ticket_number is not null)
(type: boolean)
>                     Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats:
COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats:
COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int), _col1 (type: int)
>                         sort order: ++
>                         Map-reduce partition columns: _col0 (type: int), _col1 (type:
int)
>                         Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats:
COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Map 4
>             Map Operator Tree:
>                 TableScan
>                   alias: a
>                   filterExpr: ((ss_store_sk is not null and ss_item_sk is not null) and
ss_ticket_number is not null) (type: boolean)
>                   Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats:
COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((ss_store_sk is not null and ss_item_sk is not null)
and ss_ticket_number is not null) (type: boolean)
>                     Statistics: Num rows: 8405840828 Data size: 76478045388 Basic stats:
COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: ss_item_sk (type: int), ss_store_sk (type: int), ss_ticket_number
(type: int)
>                       outputColumnNames: _col0, _col1, _col2
>                       Statistics: Num rows: 8405840828 Data size: 76478045388 Basic stats:
COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         condition expressions:
>                           0 {_col0} {_col1} {_col2}
>                           1
>                         keys:
>                           0 _col1 (type: int)
>                           1 _col0 (type: int)
>                         outputColumnNames: _col0, _col1, _col2
>                         input vertices:
>                           1 Map 1
>                         Statistics: Num rows: 8405840896 Data size: 100870090752 Basic
stats: COMPLETE Column stats: COMPLETE
>                         Reduce Output Operator
>                           key expressions: _col0 (type: int), _col2 (type: int)
>                           sort order: ++
>                           Map-reduce partition columns: _col0 (type: int), _col2 (type:
int)
>                           Statistics: Num rows: 8405840896 Data size: 100870090752 Basic
stats: COMPLETE Column stats: COMPLETE
>                           value expressions: _col1 (type: int)
>             Execution mode: vectorized
>         Reducer 3
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 {KEY.reducesinkkey0} {VALUE._col0} {KEY.reducesinkkey1}
>                   1
>                 outputColumnNames: _col0, _col1, _col2
>                 Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE
Column stats: COMPLETE
>                 Select Operator
>                   expressions: _col0 (type: int), _col2 (type: int), _col1 (type: int)
>                   outputColumnNames: _col0, _col1, _col2
>                   Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE
Column stats: COMPLETE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 75912751 Data size: 910953012 Basic stats:
COMPLETE Column stats: COMPLETE
>                     table:
>                         input format: org.apache.hadoop.mapred.TextInputFormat
>                         output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message