hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-931) Sorted Group By
Date Wed, 02 Dec 2009 06:04:20 GMT

     [ https://issues.apache.org/jira/browse/HIVE-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

He Yongqiang updated HIVE-931:

    Attachment: hive-931-2009-12-01.patch

updated the patch. And put sort columns into consideration.

     * We use bucket columns only when the sorted column set is empty or the
     * sorted column set is an exact prefix match of bucket columns. For example, A
     * table is bucketed by column a,b, and c, and a query wants to group by
     * a,b,c. If the table's sort column is null, or is [a],[a,b], or [a,b,c],
     * we can use the 'sorted groupby' by looking at the bucket columns .
     * If we can can not determine by looking at bucketed columns and the table
     * has sort columns, we resort to sort columns. We can use bucket group by
     * if the groupby column set is an exact prefix match of sort columns.

> Sorted Group By
> ---------------
>                 Key: HIVE-931
>                 URL: https://issues.apache.org/jira/browse/HIVE-931
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.5.0
>         Attachments: hive-931-2009-11-18.patch, hive-931-2009-11-19.patch, hive-931-2009-11-20.3.patch,
hive-931-2009-11-21.patch, hive-931-2009-12-01.patch
> If the table is sorted by a given key, we don't use that for group by. That can be very
> For eg: if T is sorted by column c1,
> For select c1, aggr() from T group by c1
> we always use a single map-reduce job. No hash table is needed on the mapper, since the
data is sorted by c1 anyway.
> This will reduce the memory pressure on the mapper and also remove overhead of maintaining
the hash table.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message