hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <>
Subject [jira] Updated: (HIVE-931) Sorted Group By
Date Thu, 03 Dec 2009 23:22:21 GMT


He Yongqiang updated HIVE-931:

    Attachment: hive-931-2009-12-03.patch

Attached a new patch. Had a lot of offline discussions with Namit. Thanks Namit!

Finally, we changed to rule to,
we will transform a group by to a sort based group by when

1) If a table's sort columns are empty, and buckets columns contains and only contains all
group by columns (order does not matter).


2)  If a table's sort columns are not empty, group by columns are a prefix subset of sort
For example, if sorted by a,b,c, group by 
b,a,c ..
are all ok.

> Sorted Group By
> ---------------
>                 Key: HIVE-931
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.5.0
>         Attachments: hive-931-2009-11-18.patch, hive-931-2009-11-19.patch, hive-931-2009-11-20.3.patch,
hive-931-2009-11-21.patch, hive-931-2009-12-01.patch, hive-931-2009-12-03.patch
> If the table is sorted by a given key, we don't use that for group by. That can be very
> For eg: if T is sorted by column c1,
> For select c1, aggr() from T group by c1
> we always use a single map-reduce job. No hash table is needed on the mapper, since the
data is sorted by c1 anyway.
> This will reduce the memory pressure on the mapper and also remove overhead of maintaining
the hash table.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message