hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HIVE-413) multi-table insert
Date Wed, 15 Apr 2009 07:01:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699092#action_12699092
] 

Zheng Shao edited comment on HIVE-413 at 4/15/09 12:00 AM:
-----------------------------------------------------------

GenMRRedSink1.process calls GenMapRedUtils.initPlan calls GenMapRedUtils.setTaskPlan calls
GenMapRedUtils.setKeyAndValueDesc.
GenMRRedSink1.process calls GenMapRedUtils.splitPlan calls GenMapRedUtils.splitTasks calls
GenMapRedUtils.setKeyAndValueDesc.

In setKeyAndValueDesc (shown below, from GenMapRedUtils.java:236) we are walking down the
operator tree to get all reachable reduceSinkOperators, even reduceSinkOperators of another
MapRedTask (since the Operator Graph is not split yet).

Instead of doing GenMapRedUtils.setKeyAndValueDesc inline in GenMapRedUtils.setTaskPlan and
GenMapRedUtils.splitTasks, we should first break up all MapRedTasks, then for each task, for
all topOps, call GenMapRedUtils.setKeyAndValueDesc.

{code}
  public static void setKeyAndValueDesc(mapredWork plan, Operator<? extends Serializable>
topOp) {
    if (topOp instanceof ReduceSinkOperator) {
      ReduceSinkOperator rs = (ReduceSinkOperator)topOp;
      plan.setKeyDesc(rs.getConf().getKeySerializeInfo());
      int tag = Math.max(0, rs.getConf().getTag());
      List<tableDesc> tagToSchema = plan.getTagToValueDesc();
      while (tag + 1 > tagToSchema.size()) {
        tagToSchema.add(null);
      }
      tagToSchema.set(tag, rs.getConf().getValueSerializeInfo());
    } else {
      List<Operator<? extends Serializable>> children = topOp.getChildOperators();

      if (children != null) {
        for(Operator<? extends Serializable> op: children) {
          setKeyAndValueDesc(plan, op);
        }
      }
    }
  }
{code}

      was (Author: zshao):
    GenMRRedSink1.process calls GenMapRedUtils.initPlan calls GenMapRedUtils.setKeyAndValueDesc.

In setKeyAndValueDesc (shown below, from GenMapRedUtils.java:236) we are walking down the
operator tree to get all reachable reduceSinkOperators, even reduceSinkOperators of another
MapRedTask (since the Operator Graph is not split yet).

Instead of doing GenMapRedUtils.setKeyAndValueDesc inline in GenMapRedUtils.setTaskPlan and
GenMapRedUtils.splitTasks, we should first break up all MapRedTasks, then for each task, for
all topOps, call GenMapRedUtils.setKeyAndValueDesc.

{code}
  public static void setKeyAndValueDesc(mapredWork plan, Operator<? extends Serializable>
topOp) {
    if (topOp instanceof ReduceSinkOperator) {
      ReduceSinkOperator rs = (ReduceSinkOperator)topOp;
      plan.setKeyDesc(rs.getConf().getKeySerializeInfo());
      int tag = Math.max(0, rs.getConf().getTag());
      List<tableDesc> tagToSchema = plan.getTagToValueDesc();
      while (tag + 1 > tagToSchema.size()) {
        tagToSchema.add(null);
      }
      tagToSchema.set(tag, rs.getConf().getValueSerializeInfo());
    } else {
      List<Operator<? extends Serializable>> children = topOp.getChildOperators();

      if (children != null) {
        for(Operator<? extends Serializable> op: children) {
          setKeyAndValueDesc(plan, op);
        }
      }
    }
  }
{code}
  
> multi-table insert
> ------------------
>
>                 Key: HIVE-413
>                 URL: https://issues.apache.org/jira/browse/HIVE-413
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Critical
>
> some problem in multi-table insert if both of them contain grouping keys which are different.
> have not marked it a blocker, since a workaround exists (issue both inserts separately)
- but this if the release is not yet done, we should fix this also.
> FROM SRC
> INSERT OVERWRITE TABLE DEST1 SELECT SRC.key, src.value, COUNT(DISTINCT SUBSTR(SRC.value,5))
GROUP BY SRC.key\
> , src.value
> INSERT OVERWRITE TABLE DEST2 SELECT SRC.key, COUNT(DISTINCT SUBSTR(SRC.value,5)) GROUP
BY SRC.key;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message