Return-Path: Delivered-To: apmail-hive-dev-archive@www.apache.org Received: (qmail 49491 invoked from network); 8 Apr 2011 05:23:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Apr 2011 05:23:51 -0000 Received: (qmail 53748 invoked by uid 500); 8 Apr 2011 05:23:50 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 53469 invoked by uid 500); 8 Apr 2011 05:23:49 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 53461 invoked by uid 500); 8 Apr 2011 05:23:47 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 53458 invoked by uid 99); 8 Apr 2011 05:23:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 05:23:46 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 05:23:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B4CF097ADF for ; Fri, 8 Apr 2011 05:23:05 +0000 (UTC) Date: Fri, 8 Apr 2011 05:23:05 +0000 (UTC) From: "He Yongqiang (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <642975531.43022.1302240185737.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <322818374.36046.1302039785971.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HIVE-2095) auto convert map join bug MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2095: ------------------------------- Attachment: HIVE-2095.2.patch > auto convert map join bug > ------------------------- > > Key: HIVE-2095 > URL: https://issues.apache.org/jira/browse/HIVE-2095 > Project: Hive > Issue Type: Bug > Reporter: He Yongqiang > Assignee: He Yongqiang > Attachments: HIVE-2095.1.patch, HIVE-2095.2.patch > > > 1) > when considering to choose one table as the big table candidate for a map join, if at compile time, hive can find out that the total known size of all other tables excluding the big table in consideration is bigger than a configured value, this big table candidate is a bad one, and should not put into plan. Otherwise, at runtime to filter this out may cause more time. > 2) > added a null check for back up tasks. Otherwise will see NullPointerException > 3) > CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise it will make wrong decision. > 4) > changes made to the ConditionalResolverCommonJoin: added pathToAliases, aliasToSize (alias's input size that is known at compile time, by inputSummary), and intermediate dir path. > So the logic is, go over all the pathToAliases, and for each path, if it is from intermediate dir path, add this path's size to all aliases. And finally based on the size information and others like aliasToTask to choose the big table. > 5) > Conditional task's children contains wrong options, which may cause join fail or incorrect results. Basically when getting all possible children for the conditional task, should use a whitelist of big tables. Only tables in this while list can be considered as a big table. > Here is the logic: > + * Get a list of big table candidates. Only the tables in the returned set can > + * be used as big table in the join operation. > + * > + * The logic here is to scan the join condition array from left to right. If > + * see a inner join and the bigTableCandidates is empty, add both side of this > + * inner join to big table candidates. If see a left outer join, and the > + * bigTableCandidates is empty, add the left side to it, and if the > + * bigTableCandidates is not empty, do nothing (which means the > + * bigTableCandidates is from left side). If see a right outer join, clear the > + * bigTableCandidates, and add right side to the bigTableCandidates, it means > + * the right side of a right outer join always win. If see a full outer join, > + * return null immediately (no one can be the big table, can not do a > + * mapjoin). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira