hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-609) optimize multi-group by
Date Mon, 06 Jul 2009 19:59:14 GMT
optimize multi-group by 
------------------------

                 Key: HIVE-609
                 URL: https://issues.apache.org/jira/browse/HIVE-609
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Namit Jain


For query like:

from src
insert overwrite table dest1 select col1, count(distinct colx) group by col1
insert overwrite table dest2 select col2, count(distinct colx) group by col2;



If map side aggregation is turned off, we currently do 4 map-reduce jobs.
The plan can be optimized by running it in 3 map-reduce jobs, by spraying over the
distinct column first and then aggregating individual results.

This may not be possible if there are multiple distinct columns, but the above query is very
common
in data warehousing environments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message