spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dilip Biswal (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-23281) Query produces results in incorrect order when a composite order by clause refers to both original columns and aliases
Date Wed, 31 Jan 2018 09:37:02 GMT

     [ https://issues.apache.org/jira/browse/SPARK-23281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dilip Biswal updated SPARK-23281:
---------------------------------
    Description: 
Here is the test snippet.

{code}
scala> Seq[(Integer, Integer)](
     |         (1, 1),
     |         (1, 3),
     |         (2, 3),
     |         (3, 3),
     |         (4, null),
     |         (5, null)
     |       ).toDF("key", "value").createOrReplaceTempView("src")

scala> sql(
     |         """
     |           |SELECT MAX(value) as value, key as col2
     |           |FROM src
     |           |GROUP BY key
     |           |ORDER BY value desc, key
     |         """.stripMargin).show
+-----+----+
|value|col2|
+-----+----+
|    3|   3|
|    3|   2|
|    3|   1|
| null|   5|
| null|   4|
+-----+----+
{code}

Here is the explain output :
{code}
== Parsed Logical Plan ==
'Sort ['value DESC NULLS LAST, 'key ASC NULLS FIRST], true
+- 'Aggregate ['key], ['MAX('value) AS value#9, 'key AS col2#10]
   +- 'UnresolvedRelation `src`

== Analyzed Logical Plan ==
value: int, col2: int
Project [value#9, col2#10]
+- Sort [value#9 DESC NULLS LAST, col2#10 DESC NULLS LAST], true
   +- Aggregate [key#5], [max(value#6) AS value#9, key#5 AS col2#10]
      +- SubqueryAlias src
         +- Project [_1#2 AS key#5, _2#3 AS value#6]
            +- LocalRelation [_1#2, _2#3]
{code}

The sort direction should be ascending for the 2nd column. Instead its being changed
to descending in Analyzer.resolveAggregateFunctions.

The above testcase models TPCDS-Q71 and thus we have the same issue in Q71 as well.


  was:
Here is the test snippet.

{code}
scala> Seq[(Integer, Integer)](
     |         (1, 1),
     |         (1, 3),
     |         (2, 3),
     |         (3, 3),
     |         (4, null),
     |         (5, null)
     |       ).toDF("key", "value").createOrReplaceTempView("src")

scala> sql(
     |         """
     |           |SELECT MAX(value) as value, key as col2
     |           |FROM src
     |           |GROUP BY key
     |           |ORDER BY value desc, key
     |         """.stripMargin).show
+-----+----+
|value|col2|
+-----+----+
|    3|   3|
|    3|   2|
|    3|   1|
| null|   5|
| null|   4|
+-----+----+
{code}

Here is the explain output :
{code}
== Parsed Logical Plan ==
'Sort ['value DESC NULLS LAST, 'key ASC NULLS FIRST], true
+- 'Aggregate ['key], ['MAX('value) AS value#9, 'key AS col2#10]
   +- 'UnresolvedRelation `src`

== Analyzed Logical Plan ==
value: int, col2: int
Project [value#9, col2#10]
+- Sort [value#9 DESC NULLS LAST, col2#10 DESC NULLS LAST], true
   +- Aggregate [key#5], [max(value#6) AS value#9, key#5 AS col2#10]
      +- SubqueryAlias src
         +- Project [_1#2 AS key#5, _2#3 AS value#6]
            +- LocalRelation [_1#2, _2#3]
{code}

The sort direction should be ascending for the 2nd column. Instead its being changed
to descending in Analyzer.resolveAggregateFunctions.




> Query produces results in incorrect order when a composite order by clause refers to
both original columns and aliases
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23281
>                 URL: https://issues.apache.org/jira/browse/SPARK-23281
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Dilip Biswal
>            Priority: Major
>
> Here is the test snippet.
> {code}
> scala> Seq[(Integer, Integer)](
>      |         (1, 1),
>      |         (1, 3),
>      |         (2, 3),
>      |         (3, 3),
>      |         (4, null),
>      |         (5, null)
>      |       ).toDF("key", "value").createOrReplaceTempView("src")
> scala> sql(
>      |         """
>      |           |SELECT MAX(value) as value, key as col2
>      |           |FROM src
>      |           |GROUP BY key
>      |           |ORDER BY value desc, key
>      |         """.stripMargin).show
> +-----+----+
> |value|col2|
> +-----+----+
> |    3|   3|
> |    3|   2|
> |    3|   1|
> | null|   5|
> | null|   4|
> +-----+----+
> {code}
> Here is the explain output :
> {code}
> == Parsed Logical Plan ==
> 'Sort ['value DESC NULLS LAST, 'key ASC NULLS FIRST], true
> +- 'Aggregate ['key], ['MAX('value) AS value#9, 'key AS col2#10]
>    +- 'UnresolvedRelation `src`
> == Analyzed Logical Plan ==
> value: int, col2: int
> Project [value#9, col2#10]
> +- Sort [value#9 DESC NULLS LAST, col2#10 DESC NULLS LAST], true
>    +- Aggregate [key#5], [max(value#6) AS value#9, key#5 AS col2#10]
>       +- SubqueryAlias src
>          +- Project [_1#2 AS key#5, _2#3 AS value#6]
>             +- LocalRelation [_1#2, _2#3]
> {code}
> The sort direction should be ascending for the 2nd column. Instead its being changed
> to descending in Analyzer.resolveAggregateFunctions.
> The above testcase models TPCDS-Q71 and thus we have the same issue in Q71 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message