hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7659) Unnecessary sort in query plan
Date Thu, 14 Aug 2014 09:14:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096774#comment-14096774
] 

Rui Li commented on HIVE-7659:
------------------------------

After some research, I found the unnecessary sort is mainly introduced when we generate GBY
operator. This patch ignores the sort order in RS if the partition keys, sorting keys and
grouping keys are the same. Otherwise, e.g. in case of DISTINCT or data skew, we apply the
sort shuffle according to the sort order so that the query can produce correct results.

> Unnecessary sort in query plan
> ------------------------------
>
>                 Key: HIVE-7659
>                 URL: https://issues.apache.org/jira/browse/HIVE-7659
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-7659-spark.patch
>
>
> For hive on spark.
> Currently we rely on the sort order in RS to decide whether we need a sortByKey transformation.
However a simple group by query will also have the sort order set to '+'.
> Consider the query: select key from table group by key. The RS in the map work will have
sort order set to '+', thus requiring a sortByKey shuffle.
> To avoid the unnecessary sort, we should either use another way to decide if there has
to be a sort shuffle, or we should set the sort order only when sort is really needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message