spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Gang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-25401) Reorder the required ordering to match the table's output ordering for bucket join
Date Fri, 07 Dec 2018 13:47:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712873#comment-16712873
] 

Wang, Gang commented on SPARK-25401:
------------------------------------

Yeah. I think so. 

And please make sure the outputOrdering of SortMergeJoin is align with the reordered keys. 

> Reorder the required ordering to match the table's output ordering for bucket join
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-25401
>                 URL: https://issues.apache.org/jira/browse/SPARK-25401
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Wang, Gang
>            Priority: Major
>
> Currently, we check if SortExec is needed between a operator and its child operator
in method orderingSatisfies, and method orderingSatisfies require the order in the SortOrders
are all the same.
> While, take the following case into consideration.
>  * Table a is bucketed by (a1, a2), sorted by (a2, a1), and buckets number is 200.
>  * Table b is bucketed by (b1, b2), sorted by (b2, b1), and buckets number is 200.
>  * Table a join table b on (a1=b1, a2=b2)
> In this case, if the join is sort merge join, the query planner won't add exchange on
both sides, while, sort will be added on both sides. Actually, sort is also unnecessary, since
in the same bucket, like bucket 1 of table a, and bucket 1 of table b, (a1=b1, a2=b2) is equivalent
to (a2=b2, a1=b1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message