hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <>
Subject [jira] [Commented] (HIVE-4845) Correctness issue with MapJoins using the null safe operator
Date Tue, 16 Jul 2013 13:44:58 GMT


Hive QA commented on HIVE-4845:

{color:green}Overall:{color}: +1 all checks pass

{color:green}SUCCESS:{color} +1 all tests passed

Executing org.apache.hive.ptest.execution.CleanupPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
> Correctness issue with MapJoins using the null safe operator
> ------------------------------------------------------------
>                 Key: HIVE-4845
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>            Priority: Critical
>         Attachments: HIVE-4845.patch, HIVE-4845.patch, HIVE-4845.patch
> I found a correctness issue while working on HIVE-4838. The following query from join_nullsafe.q
gives different results depending on if it's executed map-side or reduce-side:
> {noformat}
> SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> b.key
AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
> {noformat}
> For that query, on the map side, rows which should be joined are not. For example, the
reduce side outputs this row:
> {noformat}
> a.key   a.value   b.key   b.value
> 148     NULL      148     NULL
> {noformat}
> which makes sense since a.key is equal to b.key and a.value is equal to b.value but the
current map-side code omits this row. The reason is that MapJoinDoubleKey is used for the
map-side join which doesn't properly compare null values.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message