Return-Path: Delivered-To: apmail-hadoop-hive-dev-archive@minotaur.apache.org Received: (qmail 2870 invoked from network); 30 Aug 2010 05:08:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Aug 2010 05:08:37 -0000 Received: (qmail 27750 invoked by uid 500); 30 Aug 2010 05:08:37 -0000 Delivered-To: apmail-hadoop-hive-dev-archive@hadoop.apache.org Received: (qmail 27450 invoked by uid 500); 30 Aug 2010 05:08:35 -0000 Mailing-List: contact hive-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-dev@hadoop.apache.org Delivered-To: mailing list hive-dev@hadoop.apache.org Received: (qmail 27431 invoked by uid 99); 30 Aug 2010 05:08:33 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Aug 2010 05:08:33 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Aug 2010 05:08:15 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o7U57snW003515 for ; Mon, 30 Aug 2010 05:07:54 GMT Message-ID: <12123810.63411283144874140.JavaMail.jira@thor> Date: Mon, 30 Aug 2010 01:07:54 -0400 (EDT) From: "Ning Zhang (JIRA)" To: hive-dev@hadoop.apache.org Subject: [jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins In-Reply-To: <8530693.38691282940093318.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1605: ----------------------------- Attachment: HIVE-1605.patch Passed all test except scriptfile1.q in TestMinimrCliDriver in hadoop 0,20. This test also failed on trunk. > regression and improvements in handling NULLs in joins > ------------------------------------------------------ > > Key: HIVE-1605 > URL: https://issues.apache.org/jira/browse/HIVE-1605 > Project: Hadoop Hive > Issue Type: Improvement > Reporter: Ning Zhang > Assignee: Ning Zhang > Attachments: HIVE-1605.patch > > > There are regressions in sort-merge map join after HIVE-741. There are a lot of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap maintained for each key to remember whether it is NULL. This takes too much memory when the tables are large. > A second issu is in handling NULLs if the join keys are more than 1 column. This appears in regular MapJoin as well as SMBMapJoin. The code only checks if all the columns are NULL. It should return false in match if any joined value is NULL. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.