hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
Date Tue, 12 Feb 2013 18:01:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576831#comment-13576831
] 

Ashutosh Chauhan commented on HIVE-3972:
----------------------------------------

[~navis] I agree HIVE-3562 is orthogonal issue which will make what I am suggesting lesser
of an issue, but there are still some cases. As getting discussed on HIVE-3562 consider following
query: 
{code}
select value, sum(key) as sum from src group by value order by value limit 10;
{code}
In this case, limit can't be pushed in map-phase. So, HIVE-3562 optimization won't kick in.
After patch as it is currently on this jira, we will generate 1MR job with multiple reducers
and than do order-by on client in Fetch task. Here if you don't take advantage of the fact
that there is a limit in query you might possibly read millions of rows from hdfs, bring all
of them in client memory and than just show 10 to user. If you instead take limit into account
and stop merging and reading as soon as you have seen 10 rows, you have saved both on hdfs
IO as well as client memory. Make sense ? 
                
> Support using multiple reducer for fetching order by results
> ------------------------------------------------------------
>
>                 Key: HIVE-3972
>                 URL: https://issues.apache.org/jira/browse/HIVE-3972
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch
>
>
> Queries for fetching results which have lastly "order by" clause make final MR run with
single reducer, which can be too much. For example, 
> {code}
> select value, sum(key) as sum from src group by value order by sum;
> {code}
> If number of reducer is reasonable, multiple result files could be merged into single
sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message