hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brock Noland (JIRA)" <>
Subject [jira] [Commented] (HIVE-4845) Correctness issue with MapJoins using the null safe operator
Date Tue, 16 Jul 2013 14:36:48 GMT


Brock Noland commented on HIVE-4845:

The comment from "Hive QA" was from my experimental pre-commit build. Results can be seen
> Correctness issue with MapJoins using the null safe operator
> ------------------------------------------------------------
>                 Key: HIVE-4845
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>            Priority: Critical
>         Attachments: HIVE-4845.patch, HIVE-4845.patch, HIVE-4845.patch
> I found a correctness issue while working on HIVE-4838. The following query from join_nullsafe.q
gives different results depending on if it's executed map-side or reduce-side:
> {noformat}
> SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> b.key
AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
> {noformat}
> For that query, on the map side, rows which should be joined are not. For example, the
reduce side outputs this row:
> {noformat}
> a.key   a.value   b.key   b.value
> 148     NULL      148     NULL
> {noformat}
> which makes sense since a.key is equal to b.key and a.value is equal to b.value but the
current map-side code omits this row. The reason is that MapJoinDoubleKey is used for the
map-side join which doesn't properly compare null values.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message