hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Sichi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1605) regression and improvements in handling NULLs in joins
Date Mon, 30 Aug 2010 22:34:54 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904384#action_12904384
] 

John Sichi commented on HIVE-1605:
----------------------------------

@Namit:  I thought we don't need the DROP TABLE any more since Joy improved the test framework?


> regression and improvements in handling NULLs in joins
> ------------------------------------------------------
>
>                 Key: HIVE-1605
>                 URL: https://issues.apache.org/jira/browse/HIVE-1605
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1605.2.patch, HIVE-1605.3.patch, HIVE-1605.patch
>
>
> There are regressions in sort-merge map join after HIVE-741. There are a lot of OOM exceptions
in SMBMapJoinOperator. This caused by the HashMap maintained for each key to remember whether
it is NULL. This takes too much memory when the tables are large. 
> A second issu is in handling NULLs if the join keys are more than 1 column. This appears
in regular MapJoin as well as SMBMapJoin. The code only checks if all the columns are NULL.
It should return false in match if any joined value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message