hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-741) NULL is not handled correctly in join
Date Tue, 10 Aug 2010 17:39:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896974#action_12896974
] 

Ning Zhang commented on HIVE-741:
---------------------------------

The joins are implemented in the JoinOperator and CommonJoinOperators for regular reduce-side
joins. The map-side joins are implemented in the MapJoinOperator. 

In the reduce side joins, the join keys are treated as distribution keys from the mappers
to the reducers so that each group (marked by beginGroup() and endGroup()) will consists of
rows with the same join keys. The reduce-side joins will cache all rows within a group except
the last one (aka streaming table), which is scanned and cartesian producted with the cached
rows of the other tables. I think the fix would be to check the NULL value of the join keys
and do proper output based on the semantics of different types of joins. 

For the map-side join, it's basically a hash join where the small table is read in entirety
in a hash table and probed while scanning the streaming table. 

There are other types of joins (bucketed map-side join, sort merge join etc.), but they all
rely on the 3 classes mentioned above. 

Let me know if you have further questions for you to get started. 

> NULL is not handled correctly in join
> -------------------------------------
>
>                 Key: HIVE-741
>                 URL: https://issues.apache.org/jira/browse/HIVE-741
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> With the following data in table input4_cb:
> Key        Value
> ------       --------
> NULL     325
> 18          NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL    325    18   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message