hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <>
Subject [jira] [Commented] (HIVE-5093) Use a combiner for LIMIT with GROUP BY and ORDER BY operators
Date Mon, 19 Aug 2013 17:09:48 GMT


Thejas M Nair commented on HIVE-5093:

[~gopalv] I agree this is going to be very useful with <human-number> limits.

[~appodictic] I think limit queries are fairly common, for analytical queries as well as when
people are iteratively trying our their queries and want quickly check if the queries are
working as expected. This optimization can lead to significant performance boost for such
queries and the code change required is not significant as demonstrated by attached WIP patch.

I agree that we should look at different ways of adding this optimization. 

As Gopal suggested, using a different sort function for map is one option for the order-by

The fact that hive uses map-side aggregation is something to consider for optimizing the group-by
case. One option would be to push the limit into the map-side aggregation operator, that will
also reduce its memory requirements. But that is probably little more complicated than this
change which is more factored out in a separate combiner code.

> Use a combiner for LIMIT with GROUP BY and ORDER BY operators
> -------------------------------------------------------------
>                 Key: HIVE-5093
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.12.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>         Attachments: HIVE-5093-WIP-01.patch
> Operator trees of the following structure can have a memory friendly combiner put in
place after the sort-phase 
> "GBY-LIM" and "OBY-LIM"
> This will cut down on I/O when spilling to disk and particularly during the merge phase
of the reducer.
> There are two possible combiners - LimitNKeysCombiner and LimitNValuesCombiner.
> The first one would be ideal for the GROUP-BY case, while the latter would more useful
for the ORDER-BY case.
> The combiners are still relevant even if there are 1:1 forward operators on the reducer
side and for small data items, the MR base layer does not run the combiners at all.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message