hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14027) NULL values produced by left outer join do not behave as NULL
Date Mon, 11 Jul 2016 16:31:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371107#comment-15371107
] 

Jesus Camacho Rodriguez commented on HIVE-14027:
------------------------------------------------

Consider the original query:

{noformat}
select a.n, a.t, isnull(b.n), isnull(b.t) from (select * from tbl where n = 1) a  left outer
join  (select * from tbl where 1 = 2) b on a.n = b.n;
{noformat}

The field {{n}} from table {{a}} can become a constant {{1}} when we fold the OI. For every
record that the left outer join produces, {{n}} will always be {{1}}. The real problem is
that the _MapJoin_ reuses the OI of the key from the outer side for the key of the inner side.
This is fine for inner joins, but it is incorrect for outer joins.

_Solution 1_ would be to rewrite the _MapJoin_ operator logic. But that change is non-trivial
(at least for me that I am not familiar with the operator logic). After I studied it initially,
I did not proceed with it, as I would have to change the initialization logic, process logic,
UnwrapRowContainer, etc.

Thus, I went on with the _solution 2_ that is implemented in this patch. Consider an outer
join. If OI of outer/inner sides are different, we take the data from the outer side (as it
is done currently), but we apply the _converter_ logic on the data.

In fact, depending on the cost of deserializing vs cost of conversion, it might be that the
current solution is more efficient than _solution 1_.

> NULL values produced by left outer join do not behave as NULL
> -------------------------------------------------------------
>
>                 Key: HIVE-14027
>                 URL: https://issues.apache.org/jira/browse/HIVE-14027
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 1.2.1, 2.2.0
>            Reporter: Vaibhav Gumashta
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-14027.01.patch, HIVE-14027.02.patch, HIVE-14027.03.patch, HIVE-14027.patch
>
>
> Consider the following setup:
> {code}
> create table tbl (n bigint, t string); 
> insert into tbl values (1, 'one'); 
> insert into tbl values(2, 'two');
> select a.n, a.t, isnull(b.n), isnull(b.t) from (select * from tbl where n = 1) a  left
outer join  (select * from tbl where 1 = 2) b on a.n = b.n;
> 1    one    false    true
> {code}
> The query should return true for isnull(b.n).
> I've tested by inserting a row with null value for the bigint column into tbl, and isnull
returns true in that case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message