spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheng Lian (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-4296) Throw "Expression not in GROUP BY" when using same expression in group by clause and select clause
Date Mon, 05 Jan 2015 18:36:34 GMT

     [ https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Cheng Lian resolved SPARK-4296.
-------------------------------
          Resolution: Duplicate
       Fix Version/s: 1.2.0
    Target Version/s: 1.2.0  (was: 1.3.0)

This issue is a duplicate of SPARK-4322, which has already been fixed in 1.2.0.

> Throw "Expression not in GROUP BY" when using same expression in group by clause and
 select clause
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-4296
>                 URL: https://issues.apache.org/jira/browse/SPARK-4296
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.1.0
>            Reporter: Shixiong Zhu
>            Assignee: Cheng Lian
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> When the input data has a complex structure, using same expression in group by clause
and  select clause will throw "Expression not in GROUP BY".
> {code:java}
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.createSchemaRDD
> case class Birthday(date: String)
> case class Person(name: String, birthday: Birthday)
> val people = sc.parallelize(List(Person("John", Birthday("1990-01-22")), Person("Jim",
Birthday("1980-02-28"))))
> people.registerTempTable("people")
> val year = sqlContext.sql("select count(*), upper(birthday.date) from people group by
upper(birthday.date)")
> year.collect
> {code}
> Here is the plan of year:
> {code:java}
> SchemaRDD[3] at RDD at SchemaRDD.scala:105
> == Query Plan ==
> == Physical Plan ==
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression not in GROUP
BY: Upper(birthday#1.date AS date#9) AS c1#3, tree:
> Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date AS date#9)
AS c1#3]
>  Subquery people
>   LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:36
> {code}
> The bug is the equality test for `Upper(birthday#1.date)` and `Upper(birthday#1.date
AS date#9)`.
> Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message