spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Florentino Sainz (Jira)" <>
Subject [jira] [Resolved] (SPARK-29265) Window orderBy causing full-DF orderBy
Date Thu, 26 Sep 2019 21:47:00 GMT


Florentino Sainz resolved SPARK-29265.
    Resolution: Not A Bug

> Window orderBy causing full-DF orderBy 
> ---------------------------------------
>                 Key: SPARK-29265
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.0, 2.4.3, 2.4.4
>         Environment: Any
>            Reporter: Florentino Sainz
>            Priority: Minor
>         Attachments: Test.scala,
> Hi,
> I had this problem in "real" environments and also made a self-contained test ( [^Test.scala]
> Having this Window definition:
> {code:scala}
> val myWindow = Window.partitionBy($"word").orderBy("word") 
> val filt2 = filtrador.withColumn("avg_Time", avg($"number").over(myWindow)){code}
> As a user, I would expect either:
> 1- Error/warning (because trying to sort on one of the columns of the window partitionBy)
> 2- A mostly-useless operation which just orders the rows inside each Window but doesn't
affect performance too much.
> Currently what I see:
> *When I use "myWindow" in any DataFrame, somehow that Window.orderBy is performing a
global orderBy on the whole DataFrame. Similar to dataframe.orderBy("word").*
> *In my real environment, my program just didn't finish in time/crashed thus causing my
program to be very slow or crash (because as it's a global orderBy, it will just go to one
> In the test I can see how all elements of my DF are in a single partition (side-effect
of the global orderBy) 
> Full Code showing the error (see how the mapPartitions shows 99 rows in one partition)
attached in Test.scala, sbt project (src and build.sbt) attached too in

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message