spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <>
Subject [jira] [Commented] (SPARK-8317) Do not push sort into shuffle in Exchange operator
Date Tue, 14 Jul 2015 23:43:05 GMT


Apache Spark commented on SPARK-8317:

User 'JoshRosen' has created a pull request for this issue:

> Do not push sort into shuffle in Exchange operator
> --------------------------------------------------
>                 Key: SPARK-8317
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>             Fix For: 1.5.0
> In some cases, Spark SQL pushes sorting operations into the shuffle layer by specifying
a key ordering as part of the shuffle dependency. I think that we should not do this:
> - Since we do not delegate aggregation to Spark's shuffle, specifying the keyOrdering
as part of the shuffle has no effect on the shuffle map side.
> - By performing the shuffle ourselves (by inserting a sort operator after the shuffle
instead), we can use the Exchange planner to choose specialized sorting implementations based
on the types of rows being sorted.
> - We can remove some complexity from SqlSerializer2 by not requiring it to know about
sort orderings, since SQL's own sort operators will already perform the necessary defensive

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message