hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-8215) Multi-table insertion optimization #3: use 1+1 tasks instead of 1+N tasks [Spark Branch]
Date Mon, 22 Sep 2014 16:24:33 GMT
Chao created HIVE-8215:
--------------------------

             Summary: Multi-table insertion optimization #3: use 1+1 tasks instead of 1+N
tasks [Spark Branch]
                 Key: HIVE-8215
                 URL: https://issues.apache.org/jira/browse/HIVE-8215
             Project: Hive
          Issue Type: Improvement
          Components: Spark
            Reporter: Chao


Currently, for multi-table insertion it generates 1+N tasks - "1" is the task that generates
input, and "N" are the insert queries that read from the input and write to separate output
tables.

In order to make these N tasks run in parallel, we rely on {{hive.exec.parallel}} to be set
to {{true}}. In this patch, we propose an alternative approach, which is to combine these
N tasks into one single task, which contains N separate operator trees, which in execution
leads to N result RDDs. We then may be able to execute these N RDDs in parallel inside Spark,
without needing {{hive.exec.parallel}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message