hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei Yan (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (HIVE-14630) Enable PPD for AND conditions when CBO is disabled
Date Fri, 16 Sep 2016 16:49:20 GMT

     [ https://issues.apache.org/jira/browse/HIVE-14630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wei Yan reassigned HIVE-14630:
------------------------------

    Assignee: Wei Yan

> Enable PPD for AND conditions when CBO is disabled
> --------------------------------------------------
>
>                 Key: HIVE-14630
>                 URL: https://issues.apache.org/jira/browse/HIVE-14630
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer
>    Affects Versions: 2.2.0
>            Reporter: Chao Sun
>            Assignee: Wei Yan
>
> Currently the PPD optimization seems not be able to handle AND conditions very well,
when CBO is not used. To illustrate with a example:
> Table a:
> || col || type || part_col? ||
> | id | int | no |
> | datestr | string | yes |
> Table b:
> || col || type || part_col? ||
> | id | int | no |
> And the following query:
> {code}
> SELECT a.id FROM a JOIN b
> ON a.id = b.id
> WHERE a.datestr >= '2016-08-20'
> AND rand() > 0.5
> {code}
> For this query, the plan looks like the following:
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: a
>             Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>             Filter Operator
>               predicate: id is not null (type: boolean)
>               Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats:
NONE
>               Reduce Output Operator
>                 key expressions: id (type: bigint)
>                 sort order: +
>                 Map-reduce partition columns: id (type: bigint)
>                 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats:
NONE
>                 value expressions: datestr (type: string)
>           TableScan
>             alias: b
>             Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>             Filter Operator
>               predicate: id is not null (type: boolean)
>               Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats:
NONE
>               Reduce Output Operator
>                 key expressions: id (type: bigint)
>                 sort order: +
>                 Map-reduce partition columns: id (type: bigint)
>                 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats:
NONE
>       Reduce Operator Tree:
>         Join Operator
>           condition map:
>                Inner Join 0 to 1
>           keys:
>             0 id (type: bigint)
>             1 id (type: bigint)
>           outputColumnNames: _col0, _col2
>           Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>           Filter Operator
>             predicate: ((_col2 >= '2016-08-20') and (rand() > 0.5)) (type: boolean)
>             Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>             Select Operator
>               expressions: _col0 (type: bigint)
>               outputColumnNames: _col0
>               Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats:
NONE
>               File Output Operator
>                 compressed: false
>                 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats:
NONE
>                 table:
>                     input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                     output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                     serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> Note that the predicate {{a.datestr >= '2016-08-20'}} is not pushed down, since {{rand()}}
is not deterministic and thus the whole predicate is not eligible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message