spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marco Gaido (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-22806) Window Aggregate functions: unexpected result at ordered partition
Date Sat, 16 Dec 2017 09:00:04 GMT

     [ https://issues.apache.org/jira/browse/SPARK-22806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marco Gaido resolved SPARK-22806.
---------------------------------
    Resolution: Invalid

> Window Aggregate functions: unexpected result at ordered partition
> ------------------------------------------------------------------
>
>                 Key: SPARK-22806
>                 URL: https://issues.apache.org/jira/browse/SPARK-22806
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Attila Zsolt Piros
>         Attachments: WindowFunctionsWithGroupByError.scala
>
>
> I got different results for aggregate functions (even for sum and count) when the partition
is ordered "Window.partitionBy(column).orderBy(column))" and when it is not ordered 'Window.partitionBy(column)".
> Example:
> {code:java}
> test("count, sum, stddev_pop functions over window") {
>     val df = Seq(
>       ("a", 1, 100.0),
>       ("b", 1, 200.0)).toDF("key", "partition", "value")
>     df.createOrReplaceTempView("window_table")
>     checkAnswer(
>       df.select(
>         $"key",
>         count("value").over(Window.partitionBy("partition")),
>         sum("value").over(Window.partitionBy("partition")),
>         stddev_pop("value").over(Window.partitionBy("partition"))
>       ),
>       Seq(
>         Row("a", 2, 300.0, 50.0),
>         Row("b", 2, 300.0, 50.0)))
>   }
>   test("count, sum, stddev_pop functions over ordered by window") {
>     val df = Seq(
>       ("a", 1, 100.0),
>       ("b", 1, 200.0)).toDF("key", "partition", "value")
>     df.createOrReplaceTempView("window_table")
>     checkAnswer(
>       df.select(
>         $"key",
>         count("value").over(Window.partitionBy("partition").orderBy("key")),
>         sum("value").over(Window.partitionBy("partition").orderBy("key")),
>         stddev_pop("value").over(Window.partitionBy("partition").orderBy("key"))
>       ),
>       Seq(
>         Row("a", 2, 300.0, 50.0),
>         Row("b", 2, 300.0, 50.0)))
>   }
> {code}
> The "count, sum, stddev_pop functions over ordered by window" fails with the error:
> {noformat}
> == Results ==
> !== Correct Answer - 2 ==   == Spark Answer - 2 ==
> !struct<>                   struct<key:string,count(value) OVER (PARTITION BY
partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):bigint,sum(value) OVER (PARTITION
BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double,stddev_pop(value) OVER
(PARTITION BY partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double>
> ![a,2,300.0,50.0]           [a,1,100.0,0.0]
>  [b,2,300.0,50.0]           [b,2,300.0,50.0]
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message