hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
Date Fri, 06 Feb 2015 02:09:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308462#comment-14308462
] 

Rui Li commented on HIVE-9561:
------------------------------

Yes I'm trying to avoid using SHUFFLE_SORT for any non-orderBy query. But for this particular
case, we'll have QueryProperties.hasSortBy and QueryProperties.hasOrderBy both true. So we
can't really distinguish.

As to removing the unnecessary ordering for subqueries, that sounds interesting. Maybe we
can traverse SparkWork to look for SHUFFLE_SORT edges. If the downstream work on that edge
is not a leaf work, we can remove it. What do you think?

> SHUFFLE_SORT should only be used for order by query [Spark Branch]
> ------------------------------------------------------------------
>
>                 Key: HIVE-9561
>                 URL: https://issues.apache.org/jira/browse/HIVE-9561
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-9561.1-spark.patch
>
>
> The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance and are
difficult to control. So we should limit the use of {{sortByKey}} to order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message