hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (HIVE-15493) Wrong result for LEFT outer join in Tez using MapJoinOperator
Date Wed, 26 Jul 2017 00:04:12 GMT

     [ https://issues.apache.org/jira/browse/HIVE-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Owen O'Malley closed HIVE-15493.
--------------------------------

> Wrong result for LEFT outer join in Tez using MapJoinOperator
> -------------------------------------------------------------
>
>                 Key: HIVE-15493
>                 URL: https://issues.apache.org/jira/browse/HIVE-15493
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15493.01.patch, HIVE-15493.patch
>
>
> To reproduce, we can run in Tez:
> {code:sql}
> set hive.auto.convert.join=true;
> DROP TABLE IF EXISTS test_1; 
> CREATE TABLE test_1 
> ( 
> member BIGINT 
> , age VARCHAR (100) 
> ) 
> STORED AS TEXTFILE 
> ; 
> DROP TABLE IF EXISTS test_2; 
> CREATE TABLE test_2 
> ( 
> member BIGINT 
> ) 
> STORED AS TEXTFILE 
> ; 
> INSERT INTO test_1 VALUES (1, '20'), (2, '30'), (3, '40'); 
> INSERT INTO test_2 VALUES (1), (2), (3); 
> SELECT 
> t2.member 
> , t1.age_1 
> , t1.age_2 
> FROM 
> test_2 t2 
> LEFT JOIN ( 
> SELECT 
> member 
> , age as age_1 
> , age as age_2 
> FROM 
> test_1 
> ) t1 
> ON t2.member = t1.member 
> ;
> {code}
> Result is:
> {noformat}
> 1	20	NULL
> 3	40	NULL
> 2	30	NULL
> {noformat}
> Correct result is:
> {noformat}
> 1	20	20
> 3	40	40
> 2	30	30
> {noformat}
> Bug was introduced by HIVE-10582. Though the fix in HIVE-10582 does not contain tests,
it does look legit. In fact, the problem seems to be in the MapJoinOperator itself. It only
happens for LEFT outer join (not with RIGHT outer or FULL outer). Although I am still trying
to understand part of the MapJoinOperator code path, the bug could be in the initialization
of the operator. It only happens when we have duplicate values in the right part of the output.
> Till we have more time to study the problem in detail and fix the MapJoinOperator, I
will submit a fix that removes the code in SemanticAnalyzer that reuses duplicated value expressions
from RS to create multiple columns in the join output (this is equivalent to reverting HIVE-10582).

> Once this is pushed, I will create a follow-up issue to take this code back and tackle
the problem in the MapJoinOperator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message