hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-223) when using map-side aggregates - perform single map-reduce group-by
Date Sun, 11 Jan 2009 20:04:59 GMT
when using map-side aggregates - perform single map-reduce group-by
-------------------------------------------------------------------

                 Key: HIVE-223
                 URL: https://issues.apache.org/jira/browse/HIVE-223
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Joydeep Sen Sarma


today even when we do map side aggregates - we do multiple map-reduce jobs. however - the
reason for doing multiple map-reduce group-bys (for single group-bys) was the fear of skews.
When we are doing map side aggregates - skews should not exist for the most part. There can
be two reason for skews:
- large number of entries for a single grouping set - map side aggregates should take care
of this
- badness in hash function that sends too much stuff to one reducer - we should be able to
take care of this by having good hash functions (and prime number reducer counts)

So i think we should be able to do a single stage map-reduce when doing map-side aggregates.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message