hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-13164) Predicate pushdown may cause cross-product in left semi join
Date Fri, 26 Feb 2016 03:37:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168381#comment-15168381
] 

Gopal V commented on HIVE-13164:
--------------------------------

[~ctang.ma]: that actually looks like a cross-product even pre-optimization. The optimizer
is not the one generating a cross-product unless there's a missing t1.key = t2.key there?

> Predicate pushdown may cause cross-product in left semi join
> ------------------------------------------------------------
>
>                 Key: HIVE-13164
>                 URL: https://issues.apache.org/jira/browse/HIVE-13164
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>
> For some left semi join queries like followings:
> select count(1) from (select value from t1 where key = 0) t1 left semi join (select value
from t2 where key = 0) t2 on t2.value = 'val_0';
> or 
> select count(1) from (select value from t1 where key = 0) t1 left semi join (select value
from t2 where key = 0) t2 on t1.value = 'val_0';
> Their plans show that they have been converted to keyless cross-product due to the predicate
pushdown and the dropping of the on condition.
> {code}
> LOGICAL PLAN:
> t1:t1 
>   TableScan (TS_0)
>     alias: t1
>     Statistics: Num rows: 1453 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
>     Filter Operator (FIL_18)
>       predicate: (key = 0) (type: boolean)
>       Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
>       Select Operator (SEL_2)
>         Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats:
NONE
>         Reduce Output Operator (RS_9)
>           sort order: 
>           Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats:
NONE
>           Join Operator (JOIN_11)
>             condition map:
>                  Left Semi Join 0 to 1
>             keys:
>               0 
>               1 
>             Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats:
NONE
>             Group By Operator (GBY_13)
>               aggregations: count(1)
>               mode: hash
>               outputColumnNames: _col0
>               Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats:
NONE
>               Reduce Output Operator (RS_14)
>                 sort order: 
>                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats:
NONE
>                 value expressions: _col0 (type: bigint)
>                 Group By Operator (GBY_15)
>                   aggregations: count(VALUE._col0)
>                   mode: mergepartial
>                   outputColumnNames: _col0
>                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats:
NONE
>                   File Output Operator (FS_17)
>                     compressed: false
>                     Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column
stats: NONE
>                     table:
>                         input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                         output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> t2:t2 
>   TableScan (TS_3)
>     alias: t2
>     Statistics: Num rows: 645 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
>     Filter Operator (FIL_19)
>       predicate: ((key = 0) and (value = 'val_0')) (type: boolean)
>       Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
>       Select Operator (SEL_5)
>         Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats:
NONE
>         Group By Operator (GBY_8)
>           keys: 'val_0' (type: string)
>           mode: hash
>           outputColumnNames: _col0
>           Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats:
NONE
>           Reduce Output Operator (RS_10)
>             sort order: 
>             Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats:
NONE
>             Join Operator (JOIN_11)
>               condition map:
>                    Left Semi Join 0 to 1
>               keys:
>                 0 
>                 1 
>               Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column
stats: NONE
> {code}
> [~gopalv], do you think these plans are valid or not? Thanks 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message