hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongzhi Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function
Date Fri, 21 Aug 2015 02:11:46 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706117#comment-14706117
] 

Yongzhi Chen commented on HIVE-11604:
-------------------------------------

This issue has a workaround: set hive.optimize.ppd=false
Comparing query plan when ppd on/off, the major difference is a SelectOperator following PTF
Operator when ppd is off while there is no SelectOperator when ppd is on. 
Debugging the code: Originally ...PTF---> SEL..., ppd push a filter between them ...PTF-->FIL-->SEL;
After that, many optimizers work on the SEL(for example, ColumnPruner, SelectDecup ...) Then,
IdentityProjectRemover remove the SEL because it is an IdentitySelect. After the SEL is removed,
the plan is: PTF -->FIL-->File Output Operator . PTF does not have right column output
sequence, then later cause join return wrong result. We should not remove the first SEL following
the PTF operator. Make the code change accordingly.

> HIVE return wrong results in some queries with PTF function
> -----------------------------------------------------------
>
>                 Key: HIVE-11604
>                 URL: https://issues.apache.org/jira/browse/HIVE-11604
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer
>    Affects Versions: 1.2.0, 1.1.0
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>
> Following query returns empty result which is not right:
> {noformat}
> select ddd.id, ddd.fkey, aaa.name
> from (
>     select id, fkey, 
>     row_number() over (partition by id, fkey) as rnum
>     from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> {noformat}
> After remove row_number() over (partition by id, fkey) as rnum from query, the right
result returns.
> Reproduce:
> {noformat}
> create table tlb1 (id int, fkey int, val string);
> create table tlb2 (fid int, name string);
> insert into table tlb1 values(100,1,'abc');
> insert into table tlb1 values(200,1,'efg');
> insert into table tlb2 values(1, 'key1');
> select ddd.id, ddd.fkey, aaa.name
> from (
>     select id, fkey, 
>     row_number() over (partition by id, fkey) as rnum
>     from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> ....
> INFO  : Ended Job = job_local1070163923_0017
> +---------+-----------+-----------+--+
> No rows selected (14.248 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +---------+-----------+-----------+--+
> +---------+-----------+-----------+--+
> 0: jdbc:hive2://localhost:10000> select ddd.id, ddd.fkey, aaa.name
> from (
>     select id, fkey 
>     from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
> 0: jdbc:hive2://localhost:10000> from (
> 0: jdbc:hive2://localhost:10000>     select id, fkey 
> 0: jdbc:hive2://localhost:10000>     from tlb1 group by id, fkey
> 0: jdbc:hive2://localhost:10000>  ) ddd 
> 0: jdbc:hive2://localhost:10000> 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 1
> ...
> INFO  : Ended Job = job_local672340505_0019
> +---------+-----------+-----------+--+
> 2 rows selected (14.383 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +---------+-----------+-----------+--+
> | 100     | 1         | key1      |
> | 200     | 1         | key1      |
> +---------+-----------+-----------+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message