hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-8622) Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]
Date Sun, 09 Nov 2014 14:56:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203758#comment-14203758
] 

Xuefu Zhang edited comment on HIVE-8622 at 11/9/14 2:55 PM:
------------------------------------------------------------

Here is my sudo code showing my attemp to solve this seemingly complex problem:
{code}
// Notation:
// MJWork - a work with map join operator
// HTSWork = a work with HashTableSinkOperator
// sparkWork = input, the original SparkWork

// Each MJWork will build a SparkWork for its small table works. This info is held in a map
<MJWork, SparkWork>,
// originally empty and named childSparkWorkMap
Map<MJWork, SparkWork> childSparkWorkMap = new HashMap<MJWork, SparkWork>();

// dependency graph among all SparkWorks. This our final result, with root at sparkWork.
Map<SparkWork, List<SparkWork>> dependencyGraph = new new HashMap<SparkWork,
List<SparkWork>>();

// Process the original SparkWork from leaves backwards to roots.
List<BaseWork> leaves = sparkWork.getLeaves();
for (BaseWork leaf : leaves) {
  move(leaf, sparkWork);
}

/**
 * Move a work from original SparkWork to the target SparkWork
 */
void move(BaseWork work, SparkWork target) {
  List<BaseWork> parentWorks = sparkWork.getParents(work);
  if(sparkWork != target) {
    // TODO: move the work from currentParent to target.
    }
 
  if (!(work instanceof MJWork)) {
    for(BaseWork parent : parents) {
      // move each parent to the same parent SparkWork of work
      move(parent, target);
    }
  } else {
    // it's a MJWork.
    SparkWork childSparkWork = new SparkWork();
    // TODO: update dependencyGraph, target depends on childSparkWork
    childSparkMap.put(work, childSparkWork);
    for(BaseWork parent : parents) {
      if (parent instanceof HTSWork) {
        move(parent, childSparkWork);
      } else {
        move(parent, target);
      }
    }
  }
}
{code}


was (Author: xuefuz):
Here is my sudo code showing my attemp to solve this seemingly complex problem:
{code}
// Notation:
// MJWork - a work with map join operator
// HTSWork = a work with HashTableSinkOperator

// Each MJWork will build a SparkWork for its small table works. This info is held in a map
<MJWork, SparkWork>,
// originally empty and named childSparkWorkMap
Map<MJWork, SparkWork> childSparkWorkMap = new HashMap<MJWork, SparkWork>();

// dependency graph among all SparkWorks. This our final result, with root at sparkWork.
Map<SparkWork, List<SparkWork>> dependencyGraph = new new HashMap<SparkWork,
List<SparkWork>>();

// Process the original SparkWork from leaves backwards to roots.
List<BaseWork> leaves = sparkWork.getLeaves();
for (BaseWork leaf : leaves) {
  move(leaf, sparkWork);
}

/**
 * Move a work from original SparkWork to the target SparkWork
 */
void move(BaseWork work, SparkWork target) {
  List<BaseWork> parentWorks = sparkWork.getParents(work);
  if(sparkWork != target) {
    // TODO: move the work from currentParent to target.
    }
 
  if (!(work instanceof MJWork)) {
    for(BaseWork parent : parents) {
      // move each parent to the same parent SparkWork of work
      move(parent, target);
    }
  } else {
    // it's a MJWork.
    SparkWork childSparkWork = new SparkWork();
    // TODO: update dependencyGraph, target depends on childSparkWork
    childSparkMap.put(work, childSparkWork);
    for(BaseWork parent : parents) {
      if (parent instanceof HTSWork) {
        move(parent, childSparkWork);
      } else {
        move(parent, target);
      }
    }
  }
}
{code}

> Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]
> ----------------------------------------------------------------
>
>                 Key: HIVE-8622
>                 URL: https://issues.apache.org/jira/browse/HIVE-8622
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Suhas Satish
>            Assignee: Chao
>         Attachments: HIVE-8622.2-spark.patch, HIVE-8622.3-spark.patch, HIVE-8622.patch
>
>
> This is a sub-task of map-join for spark 
> https://issues.apache.org/jira/browse/HIVE-7613
> This can use the baseline patch for map-join
> https://issues.apache.org/jira/browse/HIVE-8616



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message