Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 079CC10D9B for ; Thu, 2 Jan 2014 02:32:31 +0000 (UTC) Received: (qmail 11100 invoked by uid 500); 2 Jan 2014 02:32:30 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 10940 invoked by uid 500); 2 Jan 2014 02:32:30 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 10911 invoked by uid 99); 2 Jan 2014 02:32:30 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jan 2014 02:32:30 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id ABE8B1C0482; Thu, 2 Jan 2014 02:32:27 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============6004353999865097858==" MIME-Version: 1.0 Subject: Re: Review Request 16172: ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task. From: "Navis Ryu" To: "Navis Ryu" , "Yin Huai" , "hive" Date: Thu, 02 Jan 2014 02:32:27 -0000 Message-ID: <20140102023227.29218.74028@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Navis Ryu" X-ReviewGroup: hive X-ReviewRequest-URL: https://reviews.apache.org/r/16172/ X-Sender: "Navis Ryu" References: <20131230022002.23607.50868@reviews.apache.org> In-Reply-To: <20131230022002.23607.50868@reviews.apache.org> Reply-To: "Navis Ryu" X-ReviewRequest-Repository: hive-git --===============6004353999865097858== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16172/ ----------------------------------------------------------- (Updated Jan. 2, 2014, 2:32 a.m.) Review request for hive. Changes ------- Make not to share size of intermediate tables among ConditionalTasks. Bugs: HIVE-5945 https://issues.apache.org/jira/browse/HIVE-5945 Repository: hive-git Description ------- Here is an example {code} select i_item_id, s_state, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 FROM store_sales JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk) JOIN item on (store_sales.ss_item_sk = item.i_item_sk) JOIN customer_demographics on (store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk) JOIN store on (store_sales.ss_store_sk = store.s_store_sk) where cd_gender = 'F' and cd_marital_status = 'U' and cd_education_status = 'Primary' and d_year = 2002 and s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL') group by i_item_id, s_state order by i_item_id, s_state limit 100; {\code} I turned off noconditionaltask. So, I expected that there will be 4 Map-only jobs for this query. However, I got 1 Map-only job (joining strore_sales and date_dim) and 3 MR job (for reduce joins.) So, I checked the conditional task determining the plan of the join involving item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, aliasToFileSizeMap contains all input tables used in this query and the intermediate table generated by joining store_sales and date_dim. So, when we sum the size of all small tables, the size of store_sales (which is around 45GB in my test) will be also counted. Diffs (updated) ----- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java e7aa2c9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java 37ed275 ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java f75e366 ql/src/test/org/apache/hadoop/hive/ql/plan/TestConditionalResolverCommonJoin.java 67203c9 ql/src/test/results/clientpositive/auto_join25.q.out 7427239 ql/src/test/results/clientpositive/infer_bucket_sort_convert_join.q.out 7d06739 ql/src/test/results/clientpositive/mapjoin_hook.q.out d60d16e Diff: https://reviews.apache.org/r/16172/diff/ Testing ------- Thanks, Navis Ryu --===============6004353999865097858==--