hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Navis (JIRA)" <>
Subject [jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
Date Mon, 04 Feb 2013 01:23:11 GMT


Navis commented on HIVE-3972:

Top-K optimization is already on HIVE-3562 and seemed orthogonal with this issue. And top-k
will make this further less useful.
The reason why I made this is that there are so many statements like 'hive is quite inefficient
handling order by cause its ran by single reducer' and I just hate hear that. This is not
important issue but can be a start point for other optimization exploiting ordered traits
from multiple bucket files.

Added configuration means number of reducer for lastly order-by MR stage. -1 means it will
be decided by usual calculation. 0 means disabling this.
> Support using multiple reducer for fetching order by results
> ------------------------------------------------------------
>                 Key: HIVE-3972
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-3972.D8349.1.patch
> Queries for fetching results which have lastly "order by" clause make final MR run with
single reducer, which can be too much. For example, 
> {code}
> select value, sum(key) as sum from src group by value order by sum;
> {code}
> If number of reducer is reasonable, multiple result files could be merged into single
sorted stream in the fetcher level.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message