hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong
Date Tue, 22 Jan 2013 00:42:14 GMT

     [ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Phabricator updated HIVE-3326:
------------------------------

    Attachment: HIVE-3326.D8091.1.patch

navis requested code review of "HIVE-3326 [jira] plan for multiple mapjoin followed by a normal
join is wrong".
Reviewers: JIRA

  DPAL-1968 plan for multiple mapjoin followed by a normal join is wrong

  example queries:

  create table yudi(c1 int, c2 int, c3 int, c4 int);
  create table wangmu(c1 int, c2 int, c3 int, c4 int);
  select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2
join yudi d on a.c3=d.c3;

  in explain mode, I got this:

  hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu
c on b.c2=c.c2 join yudi d on a.c3=d.c3;
  OK
  STAGE DEPENDENCIES:
    Stage-8 is a root stage
    Stage-2 depends on stages: Stage-8
    Stage-7 depends on stages: Stage-2
    Stage-3 depends on stages: Stage-7
    Stage-1 depends on stages: Stage-3

  STAGE PLANS:
    Stage: Stage-8
      Map Reduce Local Work
        Alias -> Map Local Tables:
          b
          <Not Important>
    Stage: Stage-2
      Map Reduce
        Alias -> Map Operator Tree:
          a
          <Not Important>
        Local Work:
          Map Reduce Local Work

    Stage: Stage-7
      Map Reduce Local Work
        Alias -> Map Local Tables:
          c
          <Not Important>
    Stage: Stage-3
      Map Reduce
        Alias -> Map Operator Tree:
             file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
          <Not Important>
        Local Work:
          Map Reduce Local Work

    Stage: Stage-1
      Map Reduce
        Alias -> Map Operator Tree:
          d
            TableScan

          file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
            Select Operator

        Reduce Operator Tree:
        <Not Important>

  You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result
in '.../-mr-10002').

  To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
  GenMapRedUtils.java

  if (oldMapJoin == null) {
    if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
        || local || (oldTask != null) && (parTasks != null)) {
      taskTmpDir = mjCtx.getTaskTmpDir();
      tt_desc = mjCtx.getTTDesc();
      rootOp = mjCtx.getRootMapJoinOp();
      }
  } else {
    GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
    assert oldMjCtx != null;
    taskTmpDir = oldMjCtx.getTaskTmpDir();
    tt_desc = oldMjCtx.getTTDesc();
    rootOp = oldMjCtx.getRootMapJoinOp();
  }

  my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into
'if' block, and it works.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D8091

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/test/queries/clientpositive/mapjoin_mapjoin_join.q
  ql/src/test/results/clientpositive/mapjoin_mapjoin_join.q.out

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/19497/

To: JIRA, navis

                
> plan for multiple mapjoin followed by a normal join is wrong
> ------------------------------------------------------------
>
>                 Key: HIVE-3326
>                 URL: https://issues.apache.org/jira/browse/HIVE-3326
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>         Environment: OS X 10.8; java 1.6.0_33
>            Reporter: Zhang Xinyu
>            Assignee: Navis
>         Attachments: HIVE-3326.D8091.1.patch, patch.diff
>
>
> example queries:
> {code}
> create table yudi(c1 int, c2 int, c3 int, c4 int);
> create table wangmu(c1 int, c2 int, c3 int, c4 int);
> select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2
join yudi d on a.c3=d.c3;
> {code}
> in explain mode, I got this:
> {code}
> hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join
wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
> OK
> STAGE DEPENDENCIES:
>   Stage-8 is a root stage
>   Stage-2 depends on stages: Stage-8
>   Stage-7 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-7
>   Stage-1 depends on stages: Stage-3
> STAGE PLANS:
>   Stage: Stage-8
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         b
>         <Not Important>
>   Stage: Stage-2
>     Map Reduce
>       Alias -> Map Operator Tree:
>         a
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-7
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         c
>         <Not Important>
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>            file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>         <Not Important>
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         d
>           TableScan
>         file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
>           Select Operator
>       Reduce Operator Tree:
>       <Not Important>
> {code}
> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result
in '.../-mr-10002').
> To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
> {code:title=GenMapRedUtils.java}
> if (oldMapJoin == null) {
>   if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
>       || local || (oldTask != null) && (parTasks != null)) {
>     taskTmpDir = mjCtx.getTaskTmpDir();
>     tt_desc = mjCtx.getTTDesc();
>     rootOp = mjCtx.getRootMapJoinOp();
>     }
> } else {
>   GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
>   assert oldMjCtx != null;
>   taskTmpDir = oldMjCtx.getTaskTmpDir();
>   tt_desc = oldMjCtx.getTTDesc();
>   rootOp = oldMjCtx.getRootMapJoinOp();
> }
> {code}
> my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go
into 'if' block, and it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message