hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Graham (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-647) SORT BY with GROUP ignored without LIMIT
Date Fri, 17 Jul 2009 16:52:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732599#action_12732599
] 

Bill Graham commented on HIVE-647:
----------------------------------

ORDER BY also uses multiple reducers and returns unordered results. Using SORT BY (with or
without DISTRIBUTE BY) returns N sets of ordered results, where N is the number of reducers.


It makes sense that sort by is only per-reducer. The fact that the presents or absence of
a LIMIT clause changes the result set in ways other that number of records returned though,
seemed like a bug to me.

It's an easy enough workaround though, so maybe this issue should be closed?

> SORT BY with GROUP ignored without LIMIT
> ----------------------------------------
>
>                 Key: HIVE-647
>                 URL: https://issues.apache.org/jira/browse/HIVE-647
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Bill Graham
>
> For queries with GROUP BY and SORT BY, the sort is not handled properly when a LIMIT
is not supplied. If I run the following two queries, the first returns properly sorted results.
The second does not.
> SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num DESC LIMIT
50;
> SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num DESC;
> Explain is different for the two queries as well. The first uses 3 M/R jobs and the second
only uses 2, which might be part of the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message