hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <>
Subject [jira] [Updated] (HIVE-5093) Use a combiner for LIMIT with GROUP BY and ORDER BY operators
Date Wed, 14 Aug 2013 22:03:49 GMT


Gopal V updated HIVE-5093:

    Attachment: HIVE-5093-WIP-01.patch

Rebased to trunk WIP.

This does mix up a mapred.Reducer ref into MapWork, which is not cleanly split out.

Would like some advice on how to do that with Tez in mind.
> Use a combiner for LIMIT with GROUP BY and ORDER BY operators
> -------------------------------------------------------------
>                 Key: HIVE-5093
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.12.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>         Attachments: HIVE-5093-WIP-01.patch
> Operator trees of the following structure can have a memory friendly combiner put in
place after the sort-phase 
> "GBY-LIM" and "OBY-LIM"
> This will cut down on I/O when spilling to disk and particularly during the merge phase
of the reducer.
> There are two possible combiners - LimitNKeysCombiner and LimitNValuesCombiner.
> The first one would be ideal for the GROUP-BY case, while the latter would more useful
for the ORDER-BY case.
> The combiners are still relevant even if there are 1:1 forward operators on the reducer
side and for small data items, the MR base layer does not run the combiners at all.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message