hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Sun" <>
Subject Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
Date Fri, 05 Sep 2014 20:35:46 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Sept. 5, 2014, 8:35 p.m.)

Review request for hive, Brock Noland and Xuefu Zhang.

Bugs: HIVE-7503

Repository: hive-git


For Hive's multi insert query (,
there may be an MR job for each insert. When we achieve this with Spark, it would be nice
if all the inserts can happen concurrently.
It seems that this functionality isn't available in Spark. To make things worse, the source
of the insert may be re-computed unless it's staged. Even with this, the inserts will happen
sequentially, making the performance suffer.
This task is to find out what takes in Spark to enable this without requiring staging the
source and sequential insertion. If this has to be solved in Hive, find out an optimum way
to do this.

Diffs (updated)

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ 9c808d4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/ 5ddc16d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/ 379a39c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/ 864965e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/ 76fc290 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/ PRE-CREATION

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/ PRE-CREATION




Chao Sun

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message