hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] Created: (HIVE-931) Sorted Group By
Date Fri, 13 Nov 2009 20:22:39 GMT
Sorted Group By

                 Key: HIVE-931
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Query Processor
            Reporter: Namit Jain
            Assignee: He Yongqiang
             Fix For: 0.5.0

If the table is sorted by a given key, we don't use that for group by. That can be very useful.

For eg: if T is sorted by column c1,

For select c1, aggr() from T group by c1
we always use a single map-reduce job. No hash table is needed on the mapper, since the data
is sorted by c1 anyway.

This will reduce the memory pressure on the mapper and also remove overhead of maintaining
the hash table.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message