spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "KaiXinXIaoLei (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-23542) The exists action shoule be further optimized in logical plan
Date Fri, 02 Mar 2018 03:24:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-23542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

KaiXinXIaoLei updated SPARK-23542:
----------------------------------
    Summary: The exists action shoule be further optimized in logical plan  (was: The `where
exists' action in optimized logical plan should be optimized )

> The exists action shoule be further optimized in logical plan
> -------------------------------------------------------------
>
>                 Key: SPARK-23542
>                 URL: https://issues.apache.org/jira/browse/SPARK-23542
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: KaiXinXIaoLei
>            Priority: Major
>
> The optimized logical plan of query '*select * from tt1 where exists (select *  from
tt2  where tt1.i = tt2.i)*' is :
> {code:java}
> == Optimized Logical Plan ==
> Join LeftSemi, (i#14 = i#16)
> :- HiveTableRelation `default`.`tt1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
[i#14, s#15]
> +- Project [i#16]
> +- HiveTableRelation `default`.`tt2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
[i#16, s#17]{code}
>  
> But the query of `*select * from tt1 left semi join tt2 on tt2.i = tt1.i*` is :
> {noformat}
> == Optimized Logical Plan ==
> Join LeftSemi, (i#22 = i#20)
> :- Filter isnotnull(i#20)
> : +- HiveTableRelation `default`.`tt1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
[i#20, s#21]
> +- Project [i#22]
> +- HiveTableRelation `default`.`tt2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
[i#22, s#23]{noformat}
>  
>  So i think the  optimized logical plan of '*select * from tt1 where exists (select
*  from tt2  where tt1.i = tt2.i)*;` should be further optimization.
> {code:java}
> == Optimized Logical Plan ==
> Join LeftSemi, (i#14 = i#16)
> :- Filter isnotnull(i#20)
> : +- HiveTableRelation `default`.`tt1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
[i#14, s#15]
> +- Project [i#16]
> +- HiveTableRelation `default`.`tt2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
[i#16, s#17]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message