hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-741) NULL is not handled correctly in join
Date Mon, 23 Aug 2010 17:54:17 GMT

    [ https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901508#action_12901508

He Yongqiang commented on HIVE-741:

also about Ning's comments:
>>2) SMBMapJoinOperator.compareKey() is called for each row so it is critical for performance.
In your code the hasNullElement() could be called 4 times in the worse case. If you cache
the result it can be called only twice.
Agree. Not sure how much overhead is there, will try to estimate the overhead over production
running. That will be great if you can try to cache the null check results, so that it can
only happen one time for each key. 

> NULL is not handled correctly in join
> -------------------------------------
>                 Key: HIVE-741
>                 URL: https://issues.apache.org/jira/browse/HIVE-741
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-741-1.txt, patch-741-2.txt, patch-741-3.txt, patch-741-4.txt,
patch-741-5.txt, patch-741.txt, smbjoin_nulls.q.txt
> With the following data in table input4_cb:
> Key        Value
> ------       --------
> NULL     325
> 18          NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL    325    18   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message