hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong
Date Sun, 03 Feb 2013 19:00:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569858#comment-13569858
] 

Ashutosh Chauhan commented on HIVE-3326:
----------------------------------------

Now that HIVE-3784 went in, this bug is still present in GenMapRedUtils.java. But I suspect
we will never encounter this because hints for multiple map-joins are no longer supported
so its impossible to enforce map-joins. They will automatically get converted in physical
optimizer. This implies though bug is there, there is no reasonable way to hit this code path.
[~namit] is that correct ? If this is true, than instead of having this bug lurking in codebase
and worrying about it, we should refactor the code to get rid of this dead code. If this is
not true, than we should commit this patch, perhaps with testcase which will produce this
behavior.
                
> plan for multiple mapjoin followed by a normal join is wrong
> ------------------------------------------------------------
>
>                 Key: HIVE-3326
>                 URL: https://issues.apache.org/jira/browse/HIVE-3326
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>         Environment: OS X 10.8; java 1.6.0_33
>            Reporter: Zhang Xinyu
>            Assignee: Navis
>         Attachments: HIVE-3326.D8091.1.patch, patch.diff
>
>
> example queries:
> {code}
> create table yudi(c1 int, c2 int, c3 int, c4 int);
> create table wangmu(c1 int, c2 int, c3 int, c4 int);
> select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2
join yudi d on a.c3=d.c3;
> {code}
> in explain mode, I got this:
> {code}
> hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join
wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> OK
> STAGE DEPENDENCIES:
>   Stage-8 is a root stage
>   Stage-2 depends on stages: Stage-8
>   Stage-7 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-7
>   Stage-1 depends on stages: Stage-3
> STAGE PLANS:
>   Stage: Stage-8
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         b
>         <Not Important>
>   Stage: Stage-2
>     Map Reduce
>       Alias -> Map Operator Tree:
>         a
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-7
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         c
>         <Not Important>
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>            file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         d
>           TableScan
>         file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>           Select Operator
>       Reduce Operator Tree:
>       <Not Important>
> {code}
> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result
in '.../-mr-10002').
> To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
> {code:title=GenMapRedUtils.java}
> if (oldMapJoin == null) {
>   if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
>       || local || (oldTask != null) && (parTasks != null)) {
>     taskTmpDir = mjCtx.getTaskTmpDir();
>     tt_desc = mjCtx.getTTDesc();
>     rootOp = mjCtx.getRootMapJoinOp();
>     }
> } else {
>   GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
>   assert oldMjCtx != null;
>   taskTmpDir = oldMjCtx.getTaskTmpDir();
>   tt_desc = oldMjCtx.getTTDesc();
>   rootOp = oldMjCtx.getRootMapJoinOp();
> }
> {code}
> my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go
into 'if' block, and it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message