hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Navis Ryu" <navis....@nexr.com>
Subject Re: Review Request 16172: ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.
Date Fri, 27 Dec 2013 03:12:57 GMT


> On Dec. 18, 2013, 2:02 p.m., Yin Huai wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java, line 427
> > <https://reviews.apache.org/r/16172/diff/2/?file=399281#file399281line427>
> >
> >     Seems it is not an error? If so, let's not put it in the ErrorMsg.

done.


> On Dec. 18, 2013, 2:02 p.m., Yin Huai wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java, line
262
> > <https://reviews.apache.org/r/16172/diff/2/?file=399284#file399284line262>
> >
> >     Is this one necessary?

changed to debug message


- Navis


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16172/#review30612
-----------------------------------------------------------


On Dec. 18, 2013, 5:04 a.m., Navis Ryu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16172/
> -----------------------------------------------------------
> 
> (Updated Dec. 18, 2013, 5:04 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-5945
>     https://issues.apache.org/jira/browse/HIVE-5945
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Here is an example
> {code}
> select
>    i_item_id,
>    s_state,
>    avg(ss_quantity) agg1,
>    avg(ss_list_price) agg2,
>    avg(ss_coupon_amt) agg3,
>    avg(ss_sales_price) agg4
> FROM store_sales
> JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
> JOIN customer_demographics on (store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk)
> JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
> where
>    cd_gender = 'F' and
>    cd_marital_status = 'U' and
>    cd_education_status = 'Primary' and
>    d_year = 2002 and
>    s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
> group by
>    i_item_id,
>    s_state
> order by
>    i_item_id,
>    s_state
> limit 100;
> {\code}
> I turned off noconditionaltask. So, I expected that there will be 4 Map-only jobs for
this query. However, I got 1 Map-only job (joining strore_sales and date_dim) and 3 MR job
(for reduce joins.)
> 
> So, I checked the conditional task determining the plan of the join involving item. In
ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, aliasToFileSizeMap contains all
input tables used in this query and the intermediate table generated by joining store_sales
and date_dim. So, when we sum the size of all small tables, the size of store_sales (which
is around 45GB in my test) will be also counted.  
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 45acc2b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9afc80b 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
2efa7c2 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java faf2f9b

>   ql/src/test/org/apache/hadoop/hive/ql/plan/TestConditionalResolverCommonJoin.java 67203c9

>   ql/src/test/results/clientpositive/auto_join25.q.out 7427239 
>   ql/src/test/results/clientpositive/infer_bucket_sort_convert_join.q.out 7d06739 
>   ql/src/test/results/clientpositive/mapjoin_hook.q.out d60d16e 
> 
> Diff: https://reviews.apache.org/r/16172/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Navis Ryu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message