Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 87AE711441 for ; Tue, 2 Sep 2014 21:34:22 +0000 (UTC) Received: (qmail 55203 invoked by uid 500); 2 Sep 2014 21:34:22 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 55133 invoked by uid 500); 2 Sep 2014 21:34:22 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 55121 invoked by uid 500); 2 Sep 2014 21:34:22 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 55118 invoked by uid 99); 2 Sep 2014 21:34:22 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Sep 2014 21:34:22 +0000 Date: Tue, 2 Sep 2014 21:34:21 +0000 (UTC) From: "Xuefu Zhang (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-7503) Support Hive's multi-table insert query with Spark [Spark Branch] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7503: ------------------------------ Labels: spark-m1 (was: ) > Support Hive's multi-table insert query with Spark [Spark Branch] > ----------------------------------------------------------------- > > Key: HIVE-7503 > URL: https://issues.apache.org/jira/browse/HIVE-7503 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Chao > Labels: spark-m1 > Attachments: HIVE-7503.1-spark.patch > > > For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. > It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. > This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)