spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-21998) SortMergeJoinExec did not calculate its outputOrdering correctly during physical planning
Date Tue, 19 Sep 2017 18:55:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-21998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-21998:
------------------------------------

    Assignee:     (was: Apache Spark)

> SortMergeJoinExec did not calculate its outputOrdering correctly during physical planning
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-21998
>                 URL: https://issues.apache.org/jira/browse/SPARK-21998
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Maryann Xue
>            Priority: Minor
>
> Right now the calculation of SortMergeJoinExec's outputOrdering relies on the fact that
its children have already been sorted on the join keys, while this is often not true until
EnsureRequirements has been applied.
> {code}
>   /**
>    * For SMJ, child's output must have been sorted on key or expressions with the same
order as
>    * key, so we can get ordering for key from child's output ordering.
>    */
>   private def getKeyOrdering(keys: Seq[Expression], childOutputOrdering: Seq[SortOrder])
>     : Seq[SortOrder] = {
>     keys.zip(childOutputOrdering).map { case (key, childOrder) =>
>       SortOrder(key, Ascending, childOrder.sameOrderExpressions + childOrder.child -
key)
>     }
>   }
> {code}
> Thus SortMergeJoinExec's outputOrdering is most likely not correct during the physical
planning stage, and as a result, potential physical optimizations that rely on the required/output
orderings, like SPARK-18591, will not work for SortMergeJoinExec.
> The right behavior of {{getKeyOrdering(keys, childOutputOrdering)}} should be:
> 1. If the childOutputOrdering satisfies (is a superset of) the required child ordering
=> childOutputOrdering
> 2. Otherwise => required child ordering



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message