hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <>
Subject [jira] [Commented] (HIVE-8793) Make sure multi-insert works with map join [Spark Branch]
Date Wed, 12 Nov 2014 14:38:34 GMT


Xuefu Zhang commented on HIVE-8793:

Hi [~lirui], thanks for working on this. The above task graph change is expected. However,
the only concern is that whether or how spark RDD.cache() is utilized. Reducer5 and Reducer6
will have the same input and same shuffle, so it's inefficient for them to do the same thing
repeatly. HIVE-8118 is able to add RDD cache() when SparkPlanGenerator generates the plan.
I'm not sure the logic is still in place. I will take a look at your patch to understand more
on this. Thanks.

> Make sure multi-insert works with map join [Spark Branch]
> ---------------------------------------------------------
>                 Key: HIVE-8793
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: spark-branch
>            Reporter: Chao
>            Assignee: Rui Li
>         Attachments: HIVE-8793.1-spark.patch, HIVE-8793.2-spark.patch
> Currently, HIVE-8622 is implemented based on an assumption, that for a map join query,
a BaseWork would not have multiple children. By testing through subquery_multiinsert.q did
reveal that's the case. But, we need to investigate on this, and make sure this won't happen
in general.

This message was sent by Atlassian JIRA

View raw message