hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Wilfong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-3496) Query plan for multi-join where the third table joined is a subquery containing a map-only union with hive.auto.convert.join=true is wrong
Date Fri, 21 Sep 2012 01:25:09 GMT
Kevin Wilfong created HIVE-3496:
-----------------------------------

             Summary: Query plan for multi-join where the third table joined is a subquery
containing a map-only union with hive.auto.convert.join=true is wrong
                 Key: HIVE-3496
                 URL: https://issues.apache.org/jira/browse/HIVE-3496
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.10.0
            Reporter: Kevin Wilfong
            Assignee: Kevin Wilfong


Take the following query as an example:

EXPLAIN SELECT * FROM 
src11 a JOIN
src12 b ON (a.key = b.key) JOIN
(SELECT * FROM (SELECT * FROM src13 UNION ALL SELECT * FROM src14)a )c ON c.value = b.value;

When hive.auto.convert.join=true, the two joins are implemented separately as conditional
tasks with two mapjoins and a backup common join.  In the second join, the conditional task
will be a backup task, contained in the ConditionalTask, and a root task.  This is clearly
wrong, and leads to query failures.

I've traced this to the joinUnionPlan method of GenMapRedUtils.  If the union operator was
performed in its own map reduce task and it could be a root task, when it is added to the
mapper of the existing task which performs the join in the reducer, this task will get made
a root task without first checking if the existing (non-union) task has any dependencies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message