Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D0A24109F6 for ; Mon, 30 Dec 2013 21:52:51 +0000 (UTC) Received: (qmail 24959 invoked by uid 500); 30 Dec 2013 21:52:51 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 24862 invoked by uid 500); 30 Dec 2013 21:52:51 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 24809 invoked by uid 500); 30 Dec 2013 21:52:50 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 24787 invoked by uid 99); 30 Dec 2013 21:52:50 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Dec 2013 21:52:50 +0000 Date: Mon, 30 Dec 2013 21:52:50 +0000 (UTC) From: "Yin Huai (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859114#comment-13859114 ] Yin Huai commented on HIVE-5945: -------------------------------- Thanks Navis :) I played with your patch and found a issue which I commented at the review board. I am also attaching more info at here. For the query in the description, we can have 4 map-joins. There will be 3 different intermediate tables called $INTNAME. The current patch does not update the size of $INTNAME. Here are logs. {code} 13/12/30 16:48:25 INFO ql.Driver: MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 12.76 sec HDFS Read: 388445624 HDFS Write: 20815654 SUCCESS 13/12/30 16:48:25 INFO ql.Driver: Job 0: Map: 1 Cumulative CPU: 12.76 sec HDFS Read: 388445624 HDFS Write: 20815654 SUCCESS Job 1: Map: 1 Cumulative CPU: 9.18 sec HDFS Read: 20816111 HDFS Write: 28593993 SUCCESS 13/12/30 16:48:25 INFO ql.Driver: Job 1: Map: 1 Cumulative CPU: 9.18 sec HDFS Read: 20816111 HDFS Write: 28593993 SUCCESS Job 2: Map: 1 Cumulative CPU: 17.38 sec HDFS Read: 80660331 HDFS Write: 378063 SUCCESS 13/12/30 16:48:25 INFO ql.Driver: Job 2: Map: 1 Cumulative CPU: 17.38 sec HDFS Read: 80660331 HDFS Write: 378063 SUCCESS Job 3: Map: 1 Cumulative CPU: 2.06 sec HDFS Read: 378520 HDFS Write: 96 SUCCESS 13/12/30 16:48:25 INFO ql.Driver: Job 3: Map: 1 Cumulative CPU: 2.06 sec HDFS Read: 378520 HDFS Write: 96 SUCCESS Job 4: Map: 1 Reduce: 1 Cumulative CPU: 2.45 sec HDFS Read: 553 HDFS Write: 96 SUCCESS 13/12/30 16:48:25 INFO ql.Driver: Job 4: Map: 1 Reduce: 1 Cumulative CPU: 2.45 sec HDFS Read: 553 HDFS Write: 96 SUCCESS Job 5: Map: 1 Reduce: 1 Cumulative CPU: 2.33 sec HDFS Read: 553 HDFS Write: 0 SUCCESS 13/12/30 16:48:25 INFO ql.Driver: Job 5: Map: 1 Reduce: 1 Cumulative CPU: 2.33 sec HDFS Read: 553 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 46 seconds 160 msec {code} {code} Map-join1: plan.ConditionalResolverCommonJoin: Driver alias is store_sales with size 388445409 (total size of others : 0, threshold : 25000000) Stage-28 is selected by condition resolver. Map-join2: plan.ConditionalResolverCommonJoin: Driver alias is $INTNAME with size 20815654 (total size of others : 5051899, threshold : 25000000) Stage-26 is selected by condition resolver. Map-join3: plan.ConditionalResolverCommonJoin: Driver alias is customer_demographics with size 80660096 (total size of others : 20815654, threshold : 25000000) Stage-24 is filtered out by condition resolver. Map-join4: plan.ConditionalResolverCommonJoin: Driver alias is $INTNAME with size 20815654 (total size of others : 3155, threshold : 25000000) Stage-22 is selected by condition resolver. {code} btw, a minor question. Why the log of map-join 1 shows the size of others 0? > ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task. > ----------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-5945 > URL: https://issues.apache.org/jira/browse/HIVE-5945 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 > Reporter: Yin Huai > Assignee: Navis > Priority: Critical > Attachments: HIVE-5945.1.patch.txt, HIVE-5945.2.patch.txt, HIVE-5945.3.patch.txt, HIVE-5945.4.patch.txt, HIVE-5945.5.patch.txt > > > Here is an example > {code} > select > i_item_id, > s_state, > avg(ss_quantity) agg1, > avg(ss_list_price) agg2, > avg(ss_coupon_amt) agg3, > avg(ss_sales_price) agg4 > FROM store_sales > JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk) > JOIN item on (store_sales.ss_item_sk = item.i_item_sk) > JOIN customer_demographics on (store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk) > JOIN store on (store_sales.ss_store_sk = store.s_store_sk) > where > cd_gender = 'F' and > cd_marital_status = 'U' and > cd_education_status = 'Primary' and > d_year = 2002 and > s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL') > group by > i_item_id, > s_state > order by > i_item_id, > s_state > limit 100; > {\code} > I turned off noconditionaltask. So, I expected that there will be 4 Map-only jobs for this query. However, I got 1 Map-only job (joining strore_sales and date_dim) and 3 MR job (for reduce joins.) > So, I checked the conditional task determining the plan of the join involving item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, aliasToFileSizeMap contains all input tables used in this query and the intermediate table generated by joining store_sales and date_dim. So, when we sum the size of all small tables, the size of store_sales (which is around 45GB in my test) will be also counted. -- This message was sent by Atlassian JIRA (v6.1.5#6160)