hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Sun" <chao....@cloudera.com>
Subject Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
Date Thu, 18 Sep 2014 18:39:01 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/
-----------------------------------------------------------

(Updated Sept. 18, 2014, 6:38 p.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Changes
-------

Main changed the way for detecting multi-insertion pattern.


Bugs: HIVE-7503
    https://issues.apache.org/jira/browse/HIVE-7503


Repository: hive-git


Description
-------

For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML),
there may be an MR job for each insert. When we achieve this with Spark, it would be nice
if all the inserts can happen concurrently.
It seems that this functionality isn't available in Spark. To make things worse, the source
of the insert may be re-computed unless it's staged. Even with this, the inserts will happen
sequentially, making the performance suffer.
This task is to find out what takes in Spark to enable this without requiring staging the
source and sequential insertion. If this has to be solved in Hive, find out an optimum way
to do this.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 4211a0703f5b6bfd8a628b13864fac75ef4977cf

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 695d8b90cb1989805a7ff4e39a9635bbcea9c66c

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e03a3f9d665e21e1c1b10b19dc286b842f

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290f00430dbc34dbbc1a0cef0d0eb59e6029

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java PRE-CREATION

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java PRE-CREATION

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 5fcaf643a0e90fc4acc21187f6d78cefdb1b691a

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java PRE-CREATION


Diff: https://reviews.apache.org/r/25394/diff/


Testing
-------


Thanks,

Chao Sun


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message