hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Navis Ryu" <navis....@nexr.com>
Subject Re: Review Request 16172: ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.
Date Mon, 30 Dec 2013 02:20:02 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16172/
-----------------------------------------------------------

(Updated Dec. 30, 2013, 2:20 a.m.)


Review request for hive.


Changes
-------

Added log & test case


Bugs: HIVE-5945
    https://issues.apache.org/jira/browse/HIVE-5945


Repository: hive-git


Description
-------

Here is an example
{code}
select
   i_item_id,
   s_state,
   avg(ss_quantity) agg1,
   avg(ss_list_price) agg2,
   avg(ss_coupon_amt) agg3,
   avg(ss_sales_price) agg4
FROM store_sales
JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
JOIN customer_demographics on (store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk)
JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
where
   cd_gender = 'F' and
   cd_marital_status = 'U' and
   cd_education_status = 'Primary' and
   d_year = 2002 and
   s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
group by
   i_item_id,
   s_state
order by
   i_item_id,
   s_state
limit 100;
{\code}
I turned off noconditionaltask. So, I expected that there will be 4 Map-only jobs for this
query. However, I got 1 Map-only job (joining strore_sales and date_dim) and 3 MR job (for
reduce joins.)

So, I checked the conditional task determining the plan of the join involving item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask,
aliasToFileSizeMap contains all input tables used in this query and the intermediate table
generated by joining store_sales and date_dim. So, when we sum the size of all small tables,
the size of store_sales (which is around 45GB in my test) will be also counted.  


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java daf4e4a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java 37ed275

  ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java f75e366 
  ql/src/test/org/apache/hadoop/hive/ql/plan/TestConditionalResolverCommonJoin.java 67203c9

  ql/src/test/results/clientpositive/auto_join25.q.out 7427239 
  ql/src/test/results/clientpositive/infer_bucket_sort_convert_join.q.out 7d06739 
  ql/src/test/results/clientpositive/mapjoin_hook.q.out d60d16e 

Diff: https://reviews.apache.org/r/16172/diff/


Testing
-------


Thanks,

Navis Ryu


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message