spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Li (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-21538) Attribute resolution inconsistency in Dataset API
Date Thu, 27 Jul 2017 23:52:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-21538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiao Li resolved SPARK-21538.
-----------------------------
       Resolution: Fixed
         Assignee: Anton Okolnychyi
    Fix Version/s: 2.3.0
                   2.2.1

> Attribute resolution inconsistency in Dataset API
> -------------------------------------------------
>
>                 Key: SPARK-21538
>                 URL: https://issues.apache.org/jira/browse/SPARK-21538
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Adrian Ionescu
>            Assignee: Anton Okolnychyi
>             Fix For: 2.2.1, 2.3.0
>
>
> {code}
> spark.range(1).withColumnRenamed("id", "x").sort(col("id"))  // works
> spark.range(1).withColumnRenamed("id", "x").sort($"id")  // works
> spark.range(1).withColumnRenamed("id", "x").sort('id) // works
> spark.range(1).withColumnRenamed("id", "x").sort("id") // fails with:
> org.apache.spark.sql.AnalysisException: Cannot resolve column name "id" among (x);
> ...
> {code}
> It looks like the Dataset API functions taking {{String}} use the basic resolver that
only look at the columns at that level, whereas all the other means of expressing an attribute
are lazily resolved during the analyzer.
> The reason why the first 3 calls work is explained in the docs for {{object ResolveMissingReferences}}:
> {code}
>   /**
>    * In many dialects of SQL it is valid to sort by attributes that are not present in
the SELECT
>    * clause.  This rule detects such queries and adds the required attributes to the
original
>    * projection, so that they will be available during sorting. Another projection is
added to
>    * remove these attributes after sorting.
>    *
>    * The HAVING clause could also used a grouping columns that is not presented in the
SELECT.
>    */
> {code}
> For consistency, it would be good to use the same attribute resolution mechanism everywhere.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message