hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6395) multi-table insert from select transform fails if optimize.ppd enabled
Date Sat, 22 Mar 2014 02:35:43 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943851#comment-13943851
] 

Xuefu Zhang commented on HIVE-6395:
-----------------------------------

{quote}
Although I am curious why this is duplicated in HIVE-4293 which is about subquery + udtf,
I wonder if its a part of the main fix, or just an additional fix that got added?
{quote}

I didn't read the patch in HIVE-4293, but from Hive's perspective, UDTF is very similar to
TRANSFORM() except that the former is is done via UDTF's java code, and the later in external
script via streaming. For this reason, the problem here might be a sub-problem of HIVE-4293.
This is just my guess.

> multi-table insert from select transform fails if optimize.ppd enabled
> ----------------------------------------------------------------------
>
>                 Key: HIVE-6395
>                 URL: https://issues.apache.org/jira/browse/HIVE-6395
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.13.0
>            Reporter: Szehon Ho
>            Assignee: Szehon Ho
>         Attachments: HIVE-6395.patch, test.py
>
>
> {noformat}
> set hive.optimize.ppd=true;
> add file ./test.py;
> from (select transform(test.*) using 'python ./test.py'
> as id,name,state from test) t0
> insert overwrite table test2 select * where state=1
> insert overwrite table test3 select * where state=2;
> {noformat}
> In the above example, the select transform returns an extra column, and that column is
used in where clause of the multi-insert selects.  However, if optimize is on, the query plan
is wrong:
> filter (state=1 and state=2) //impossible
> --> select, insert into test1
> --> select, insert into test2
> The correct query plan for hive.optimize.ppd=false is:
> filter (state=1)
> --> select, insert into test1
> filter (state=2)
> --> select, insert into test2
> For reference
> {noformat}
> create table test (id int, name string)
> create table test2(id int, name string, state int)
> create table test3(id int, name string, state int)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message