hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-7503) Support Hive's multi-table insert query with Spark
Date Thu, 24 Jul 2014 18:30:40 GMT
Xuefu Zhang created HIVE-7503:
---------------------------------

             Summary: Support Hive's multi-table insert query with Spark
                 Key: HIVE-7503
                 URL: https://issues.apache.org/jira/browse/HIVE-7503
             Project: Hive
          Issue Type: Sub-task
          Components: Spark
            Reporter: Xuefu Zhang


For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML),
there may be an MR job for each insert.  When we achieve this with Spark, it would be nice
if all the inserts can happen concurrently.

It seems that this functionality isn't available in Spark. To make things worse, the source
of the insert may be re-computed unless it's staged. Even with this, the inserts will happen
sequentially, making the performance suffer.

This task is to find out what takes in Spark to enable this without requiring staging the
source and sequential insertion. If this has to be solved in Hive, find out an optimum way
to do this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message