spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Ash (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-22204) Explain output for SQL with commands shows no optimization
Date Wed, 18 Oct 2017 01:14:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208676#comment-16208676
] 

Andrew Ash commented on SPARK-22204:
------------------------------------

One way to work around this issue could be by getting the child of the command node and running
explain on that.  This does do the query planning twice though.

See also discussion at https://github.com/apache/spark/pull/19269#discussion_r139841435

> Explain output for SQL with commands shows no optimization
> ----------------------------------------------------------
>
>                 Key: SPARK-22204
>                 URL: https://issues.apache.org/jira/browse/SPARK-22204
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Andrew Ash
>
> When displaying the explain output for a basic SELECT query, the query plan changes as
expected from analyzed -> optimized stages.  But when putting that same query into a command,
for example {{CREATE TABLE}} it appears that the optimization doesn't take place.
> In Spark shell:
> Explain output for a {{SELECT}} statement shows optimization:
> {noformat}
> scala> spark.sql("SELECT a FROM (SELECT a FROM (SELECT a FROM (SELECT 1 AS a) AS b)
AS c) AS d").explain(true)
> == Parsed Logical Plan ==
> 'Project ['a]
> +- 'SubqueryAlias d
>    +- 'Project ['a]
>       +- 'SubqueryAlias c
>          +- 'Project ['a]
>             +- SubqueryAlias b
>                +- Project [1 AS a#29]
>                   +- OneRowRelation
> == Analyzed Logical Plan ==
> a: int
> Project [a#29]
> +- SubqueryAlias d
>    +- Project [a#29]
>       +- SubqueryAlias c
>          +- Project [a#29]
>             +- SubqueryAlias b
>                +- Project [1 AS a#29]
>                   +- OneRowRelation
> == Optimized Logical Plan ==
> Project [1 AS a#29]
> +- OneRowRelation
> == Physical Plan ==
> *Project [1 AS a#29]
> +- Scan OneRowRelation[]
> scala> 
> {noformat}
> But the same command run inside {{CREATE TABLE}} does not:
> {noformat}
> scala> spark.sql("CREATE TABLE IF NOT EXISTS tmptable AS SELECT a FROM (SELECT a FROM
(SELECT a FROM (SELECT 1 AS a) AS b) AS c) AS d").explain(true)
> == Parsed Logical Plan ==
> 'CreateTable `tmptable`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Ignore
> +- 'Project ['a]
>    +- 'SubqueryAlias d
>       +- 'Project ['a]
>          +- 'SubqueryAlias c
>             +- 'Project ['a]
>                +- SubqueryAlias b
>                   +- Project [1 AS a#33]
>                      +- OneRowRelation
> == Analyzed Logical Plan ==
> CreateHiveTableAsSelectCommand [Database:default}, TableName: tmptable, InsertIntoHiveTable]
>    +- Project [a#33]
>       +- SubqueryAlias d
>          +- Project [a#33]
>             +- SubqueryAlias c
>                +- Project [a#33]
>                   +- SubqueryAlias b
>                      +- Project [1 AS a#33]
>                         +- OneRowRelation
> == Optimized Logical Plan ==
> CreateHiveTableAsSelectCommand [Database:default}, TableName: tmptable, InsertIntoHiveTable]
>    +- Project [a#33]
>       +- SubqueryAlias d
>          +- Project [a#33]
>             +- SubqueryAlias c
>                +- Project [a#33]
>                   +- SubqueryAlias b
>                      +- Project [1 AS a#33]
>                         +- OneRowRelation
> == Physical Plan ==
> CreateHiveTableAsSelectCommand CreateHiveTableAsSelectCommand [Database:default}, TableName:
tmptable, InsertIntoHiveTable]
>    +- Project [a#33]
>       +- SubqueryAlias d
>          +- Project [a#33]
>             +- SubqueryAlias c
>                +- Project [a#33]
>                   +- SubqueryAlias b
>                      +- Project [1 AS a#33]
>                         +- OneRowRelation
> scala>
> {noformat}
> Note that there is no change between the analyzed and optimized plans when run in a command.
> This is misleading my users, as they think that there is no optimization happening in
the query!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message