spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nattavut Sutyanyong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-19086) Improper scoping of name resolution of columns in HAVING clause
Date Fri, 06 Jan 2017 03:22:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-19086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803381#comment-15803381
] 

Nattavut Sutyanyong commented on SPARK-19086:
---------------------------------------------

I wore a conservative lens when opening this JIRA. This is from the point of view that `t2c`
is available on the current query block but due to the aggregate, it just cannot be referenced
above the aggregate. I have checked on MySQL. It uses a progressive lens. So does Spark. When
it cannot reference a column, it goes out to the next outer scope to find one.

The problem with this, especially when deep correlation is supported, is the outer scope where
the column is resolved can be 10 levels up. (okay, I am exaggerated here. Who, or which program,
on earth will create such SQL!) It would be very confusing for the reader. This can be easily
solved by adding the correlation name to the column.

I will close the JIRA.



> Improper scoping of name resolution of columns in HAVING clause
> ---------------------------------------------------------------
>
>                 Key: SPARK-19086
>                 URL: https://issues.apache.org/jira/browse/SPARK-19086
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Nattavut Sutyanyong
>            Priority: Minor
>
> There seems to be a problem on the scoping of name resolution of columns in a HAVING
clause.
> Here is a scenario of the problem:
> {code}
> // A simplified version of TC 01.13 from PR-16337
> Seq((1,1,1)).toDF("t1a", "t1b", "t1c").createOrReplaceTempView("t1")
> Seq((1,1,1)).toDF("t2a", "t2b", "t2c").createOrReplaceTempView("t2")
> // This is okay. 
> // Error: t2c is unresolved
> sql("select t2a from t2 group by t2a having t2c = 8").show
> // This is okay as t2c is resolved to the t2 on the parent side
> // because t2 in the subquery does not output column t2c.
> sql("select * from t2 where t2a in (select t2a from (select t2a from t2) t2 group by
t2a having t2c = 8)").explain(true)
> // This is the problem.
> sql("select * from t2 where t2a in (select t2a from t2 group by t2a having t2c = 8)").explain(true)
> == Analyzed Logical Plan ==
> t2a: int, t2b: int, t2c: int
> Project [t2a#22, t2b#23, t2c#24]
> +- Filter predicate-subquery#38 [(t2a#22 = t2a#22#49) && (t2c#24 = 8)]
>    :  +- Project [t2a#22 AS t2a#22#49]
>    :     +- Aggregate [t2a#22], [t2a#22]
>    :        +- SubqueryAlias t2, `t2`
>    :           +- Project [_1#18 AS t2a#22, _2#19 AS t2b#23, _3#20 AS t2c#24]
>    :              +- LocalRelation [_1#18, _2#19, _3#20]
>    +- SubqueryAlias t2, `t2`
>       +- Project [_1#18 AS t2a#22, _2#19 AS t2b#23, _3#20 AS t2c#24]
>          +- LocalRelation [_1#18, _2#19, _3#20]
> {code}
> We should not resolve {{t2c}} in the subquery to the outer {{t2}} on the parent side.
It should try to resolve {{t2c}} to the {{t2}} in the subquery from its current scope and
raise an exception because it is invalid to pull up the column {{t2c}} from the {{Aggregate}}
operator below.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message