hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9041) Generate better plan for queries containing both union and multi-insert [Spark Branch]
Date Fri, 12 Dec 2014 01:36:13 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243562#comment-14243562
] 

Xuefu Zhang commented on HIVE-9041:
-----------------------------------

So the problem is a single (static) instance to cache IOContext (in case of rdd caching),
where there could be more than one input being processed. Maybe we should be more sophisticated
than using a single static variable.

> Generate better plan for queries containing both union and multi-insert [Spark Branch]
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-9041
>                 URL: https://issues.apache.org/jira/browse/HIVE-9041
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: spark-branch
>            Reporter: Chao
>            Assignee: Chao
>
> This is a follow-up for HIVE-8920. For queries like:
> {code}
> from (select * from table0 union all select * from table1) s
> insert overwrite table table3 select s.x, count(1) group by s.x
> insert overwrite table table4 select s.y, count(1) group by s.y;
> {code}
> Currently we generate the following plan:
> {noformat}
>     M1    M2
>       \  / \
>        U3   R5
>        |
>        R4
> {noformat}
> It's better, however, to have the following plan:
> {noformat}
>    M1  M2
>    |\  /|
>    | \/ |
>    | /\ |
>    R4  R5
> {noformat}
> Also, we can do some reseach in this JIRA to see if it's possible
> to remove UnionWork once and for all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message