hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <>
Subject [jira] [Commented] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
Date Tue, 09 Sep 2014 17:02:29 GMT


Ashutosh Chauhan commented on HIVE-7405:

Do we really need AggregreateMapReduceUsage enum? Seems like GroupbyDesc.Mode can be used
instead as follows:
AggregreateMapReduceUsage.MAP -> Mode.Hash
AggregreateMapReduceUsage.REDUCE -> Mode.MergePartial
AggregreateMapReduceUsage.MAP_REDUCE -> Mode.all_other

If possible, we should reuse GroupbyDesc.Mode, otherwise these modes can be mixed and matched
and will lead to explosion of combinations.

> Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
> ------------------------------------------------------
>                 Key: HIVE-7405
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Vectorization
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>         Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch,
HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch,
HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch,
HIVE-7405.96.patch, HIVE-7405.97.patch, HIVE-7405.98.patch, HIVE-7405.99.patch, HIVE-7405.991.patch,
HIVE-7405.994.patch, HIVE-7405.995.patch
> Vectorize the basic case that does not have any count distinct aggregation.
> Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch
has only values for one key at a time.  Thus, the values in the batch can be aggregated quickly.

This message was sent by Atlassian JIRA

View raw message