hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] Created: (HIVE-1772) optimize join followed by a groupby
Date Fri, 05 Nov 2010 23:30:43 GMT
optimize join followed by a groupby

                 Key: HIVE-1772
             Project: Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Namit Jain

explain SELECT x.key, count(1) FROM src1 x JOIN src y ON (x.key = y.key) group by x.key;

  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 is a root stage

The above query issues 2 map-reduce jobs. 
The first MR job performs the join, whereas the second MR performs the group by.
Since the data is already sorted, the group by can be performed in the reducer of the join

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message