hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jan-Erik Hedbom (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-6057) Enable bucketed sorted merge joins of arbitrary subqueries
Date Thu, 19 Dec 2013 10:21:06 GMT
Jan-Erik Hedbom created HIVE-6057:
-------------------------------------

             Summary: Enable bucketed sorted merge joins of arbitrary subqueries
                 Key: HIVE-6057
                 URL: https://issues.apache.org/jira/browse/HIVE-6057
             Project: Hive
          Issue Type: Improvement
          Components: Query Processor
    Affects Versions: 0.12.0
            Reporter: Jan-Erik Hedbom
            Priority: Minor


Currently, you cannot use bucketed SMJ when joining subquery results. It would make sense
to be able to explicitly specify bucketing of the intermediate output from a subquery to enable
bucketed SMJ.

For example, the following query will NOT use bucketed SMJ:
(gameends and dummymapping are clustered and sorted by hashid into 128 buckets)

select * from (select hashid,count(*) as c from gameends group by hashid distribute by hashid
sort by hashid) e join dummymapping m on e.hashid=m.hashid

Suggestion: Implement an INTO n BUCKETS syntax for subqueries to enable bucketed SMJ:
select * from (select hashid,count(*) as c from gameends group by hashid distribute by hashid
sort by hashid INTO 128 BUCKETS) e join dummymapping m on e.hashid=m.hashid



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message