hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
Date Sat, 02 Feb 2013 17:32:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569600#comment-13569600
] 

Ashutosh Chauhan commented on HIVE-3972:
----------------------------------------

I think this optimization will become more useful if it also considers the limit in query,
since in most cases queries order-by is accompanied by limit. So, we can stop fetching and
merging the results as soon as we get number of records in limit clause. Or does this already
takes limit in account ?
                
> Support using multiple reducer for fetching order by results
> ------------------------------------------------------------
>
>                 Key: HIVE-3972
>                 URL: https://issues.apache.org/jira/browse/HIVE-3972
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-3972.D8349.1.patch
>
>
> Queries for fetching results which have lastly "order by" clause make final MR run with
single reducer, which can be too much. For example, 
> {code}
> select value, sum(key) as sum from src group by value order by sum;
> {code}
> If number of reducer is reasonable, multiple result files could be merged into single
sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message