hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang" <xzh...@cloudera.com>
Subject Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
Date Fri, 19 Sep 2014 20:14:10 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/#review54004
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
<https://reviews.apache.org/r/25394/#comment93870>

    I was thinking that you have all the paths at hand, now you just keep moving up all branches
up together and then checking if lca is hit. I didn't realize we do this while we are still
traversing the tree.
    
    I have to admit that I don't quite get the whole logic here. Does the lca change before
all TS is visited?



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
<https://reviews.apache.org/r/25394/#comment93876>

    Correct me if I'm wrong. For the whole graph, we at most find one LCA to split the plan,
right? Also, in no way, th LCA can be an FORWARD, right? But there can be mutliple FORARDs,
which can have a common ancestor, which might be a point to split.
    
    Again, I don't quite understand how lca is identified while we are still visiting the
tree. But I'm sure that we don't want to create more spark jobs then needed. If we don't do
better than MR when we could, then the meaning of the project would be greatly compromised.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
<https://reviews.apache.org/r/25394/#comment93869>

    Okay. Fair enough.


- Xuefu Zhang


On Sept. 18, 2014, 6:38 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25394/
> -----------------------------------------------------------
> 
> (Updated Sept. 18, 2014, 6:38 p.m.)
> 
> 
> Review request for hive, Brock Noland and Xuefu Zhang.
> 
> 
> Bugs: HIVE-7503
>     https://issues.apache.org/jira/browse/HIVE-7503
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML),
there may be an MR job for each insert. When we achieve this with Spark, it would be nice
if all the inserts can happen concurrently.
> It seems that this functionality isn't available in Spark. To make things worse, the
source of the insert may be re-computed unless it's staged. Even with this, the inserts will
happen sequentially, making the performance suffer.
> This task is to find out what takes in Spark to enable this without requiring staging
the source and sequential insertion. If this has to be solved in Hive, find out an optimum
way to do this.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 4211a0703f5b6bfd8a628b13864fac75ef4977cf

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 695d8b90cb1989805a7ff4e39a9635bbcea9c66c

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e03a3f9d665e21e1c1b10b19dc286b842f

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290f00430dbc34dbbc1a0cef0d0eb59e6029

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 5fcaf643a0e90fc4acc21187f6d78cefdb1b691a

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/25394/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Chao Sun
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message