hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-742) joins dont handle null correctly
Date Sat, 08 Aug 2009 05:22:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740838#action_12740838
] 

Namit Jain commented on HIVE-742:
---------------------------------

Pasted from http://issues.apache.org/jira/browse/HIVE-734 as Ning corrrectly found out.

The issue is not null vs. null comparison - the join happens at the reducer where the keys
are not even taken into consideration.
NULL hashes to the same value, which explains this behavior.

The correct fix would be to filter the rows with NULL keys at the mapper itself.

However, we need to figure out the following:

1. Expected semantics when join key consists of a null and a non-null column. If that is also
same (4, NULL) does not join with (4, NULL), then we can safely
    filter all rows even if one key is NULL.
2. Expected behavior of outer joins.



> joins dont handle null correctly 
> ---------------------------------
>
>                 Key: HIVE-742
>                 URL: https://issues.apache.org/jira/browse/HIVE-742
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>
> Ning Zhang added a comment - 07/Aug/09 10:46 AM
> BTW, I tried joins and it doesn't handle NULL semantics correctly. Here's the data and
results:
> Table:
> Key Value
> ------- -------
> NULL 325
> 18 NULL
> Query:
> select * from input4_cb a join input4_cb b on a.key = b.value;
> Result:
> NULL 325 18 NULL
> The correct result should be empty set.
> I guess the NULL vs. NULL comparison is incorrectly evaluated.
> [ Show ยป ]
> Ning Zhang added a comment - 07/Aug/09 10:46 AM BTW, I tried joins and it doesn't handle
NULL semantics correctly. Here's the data and results: Table: Key Value ------- ------- NULL
325 18 NULL Query: select * from input4_cb a join input4_cb b on a.key = b.value; Result:
NULL 325 18 NULL The correct result should be empty set. I guess the NULL vs. NULL comparison
is incorrectly evaluated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message