hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-931) Sorted Group By
Date Sat, 21 Nov 2009 02:15:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780886#action_12780886

Namit Jain commented on HIVE-931:

1. Add new parameter in hive-default.xml
2. Utilities.java: change function names - extractColumnNamesFromSortCols
   variable name: bucketCol: line 813
3. Remove all the tabs
4. Given	the fact that we are not doing	this optimization across sub-queries right now,
   would it be simpler to maintain the group by operator to table mapping via a separate walker
   instead of getting while generating the group by operator ? -- I am fine with	the current
   also, but just a question.
5. You are still doing partition pruning in GroupByOptimizer - why cant we reuse the mapping
   ParseContext. That was the whole reason for storing it in ParseContext.

Sorry about being so picky...

> Sorted Group By
> ---------------
>                 Key: HIVE-931
>                 URL: https://issues.apache.org/jira/browse/HIVE-931
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.5.0
>         Attachments: hive-931-2009-11-18.patch, hive-931-2009-11-19.patch, hive-931-2009-11-20.3.patch
> If the table is sorted by a given key, we don't use that for group by. That can be very
> For eg: if T is sorted by column c1,
> For select c1, aggr() from T group by c1
> we always use a single map-reduce job. No hash table is needed on the mapper, since the
data is sorted by c1 anyway.
> This will reduce the memory pressure on the mapper and also remove overhead of maintaining
the hash table.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message