Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A505710E17 for ; Wed, 18 Dec 2013 14:10:21 +0000 (UTC) Received: (qmail 78199 invoked by uid 500); 18 Dec 2013 14:10:11 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 78128 invoked by uid 500); 18 Dec 2013 14:10:09 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 78111 invoked by uid 500); 18 Dec 2013 14:10:07 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 78104 invoked by uid 99); 18 Dec 2013 14:10:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Dec 2013 14:10:07 +0000 Date: Wed, 18 Dec 2013 14:10:07 +0000 (UTC) From: "Yin Huai (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851756#comment-13851756 ] Yin Huai commented on HIVE-5945: -------------------------------- Two minor comments in the review board. Two additional comments. When we find {code} bigTableFileAlias != null {\code} can we also log sumOfOthers and the threshold of the size of small tables? So, the log entry will show the size of the big table, the total size of other small tables, and the threshold of the size of small tables. Also, can you add a unit test? Thanks :) > ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task. > ----------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-5945 > URL: https://issues.apache.org/jira/browse/HIVE-5945 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 > Reporter: Yin Huai > Assignee: Navis > Priority: Critical > Attachments: HIVE-5945.1.patch.txt, HIVE-5945.2.patch.txt, HIVE-5945.3.patch.txt > > > Here is an example > {code} > select > i_item_id, > s_state, > avg(ss_quantity) agg1, > avg(ss_list_price) agg2, > avg(ss_coupon_amt) agg3, > avg(ss_sales_price) agg4 > FROM store_sales > JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk) > JOIN item on (store_sales.ss_item_sk = item.i_item_sk) > JOIN customer_demographics on (store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk) > JOIN store on (store_sales.ss_store_sk = store.s_store_sk) > where > cd_gender = 'F' and > cd_marital_status = 'U' and > cd_education_status = 'Primary' and > d_year = 2002 and > s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL') > group by > i_item_id, > s_state > order by > i_item_id, > s_state > limit 100; > {\code} > I turned off noconditionaltask. So, I expected that there will be 4 Map-only jobs for this query. However, I got 1 Map-only job (joining strore_sales and date_dim) and 3 MR job (for reduce joins.) > So, I checked the conditional task determining the plan of the join involving item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, aliasToFileSizeMap contains all input tables used in this query and the intermediate table generated by joining store_sales and date_dim. So, when we sum the size of all small tables, the size of store_sales (which is around 45GB in my test) will be also counted. -- This message was sent by Atlassian JIRA (v6.1.4#6159)