hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brock Noland" <br...@cloudera.com>
Subject Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
Date Sat, 20 Sep 2014 01:03:43 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/#review54065
-----------------------------------------------------------


Awesome work!!!! I have a few minor comments that can be addressed in a *follow on* patch.


ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java
<https://reviews.apache.org/r/25394/#comment94014>

    it sounds like we'll be creating a multi-insert specific context? In that context can
we make all the members private?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
<https://reviews.apache.org/r/25394/#comment94012>

    nit: this shoudl call add since offer simply returns false when it does work



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
<https://reviews.apache.org/r/25394/#comment94015>

    pls add lca.getNumChild() to the error message


- Brock Noland


On Sept. 20, 2014, 12:04 a.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25394/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2014, 12:04 a.m.)
> 
> 
> Review request for hive, Brock Noland and Xuefu Zhang.
> 
> 
> Bugs: HIVE-7503
>     https://issues.apache.org/jira/browse/HIVE-7503
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML),
there may be an MR job for each insert. When we achieve this with Spark, it would be nice
if all the inserts can happen concurrently.
> It seems that this functionality isn't available in Spark. To make things worse, the
source of the insert may be re-computed unless it's staged. Even with this, the inserts will
happen sequentially, making the performance suffer.
> This task is to find out what takes in Spark to enable this without requiring staging
the source and sequential insertion. If this has to be solved in Hive, find out an optimum
way to do this.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 4211a07

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 695d8b9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 5fcaf64

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/25394/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Chao Sun
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message