hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang" <xzh...@cloudera.com>
Subject Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
Date Fri, 19 Jun 2015 13:49:36 GMT


> On June 19, 2015, 3:42 a.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java,
line 207
> > <https://reviews.apache.org/r/34757/diff/2/?file=986303#file986303line207>
> >
> >     I think in SparkWork, there couldn't be two parents connectting to the same
child. UnionWork wold be such a child, but SparkWork doesn't have UnionWork, if I'm not mistaken.
> >     
> >     I don't think SparkPlan has a limitation of only link between to trans. If there
are two links between a parent to a child, the input will be self unioned and the result is
the input to the child.
> 
> chengxiang li wrote:
>     Take self-join for example, there would be 2 MapWork connect to same ReduceWork.
if we combine these 2 MapWorks into 1, SparkPlan::connect would throw exception during SparkPlan
generation.

I see. Thanks for the explanation. However, I'm wondering if we should remove the restriction.
Otherwise, certain cases such as self join will not take the advantage of this feature, right?


> On June 19, 2015, 3:42 a.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java,
line 157
> > <https://reviews.apache.org/r/34757/diff/2/?file=986303#file986303line157>
> >
> >     Could parents be null, in case of top-level works? Same for children.
> 
> chengxiang li wrote:
>     SparkWork always return not null List now, but it may changes, so it always not harm
to add null verification.

Yeah, if that's the case, the original code is cleaner and easier to read. If some changes,
the tests might just catch the NPE.


- Xuefu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/#review88484
-----------------------------------------------------------


On June 19, 2015, 7:22 a.m., chengxiang li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34757/
> -----------------------------------------------------------
> 
> (Updated June 19, 2015, 7:22 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10844
>     https://issues.apache.org/jira/browse/HIVE-10844
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into
sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single
one would help to benifit from following dynamic RDD caching optimization.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
>   ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
>   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
>   ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 
>   ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
>   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
>   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
>   ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
2cb126d 
>   ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f 
>   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 
>   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
>   ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
>   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
>   ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
>   ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
>   ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
>   ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
>   ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
>   ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
>   ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
>   ql/src/test/results/clientpositive/spark/union5.q.out afee988 
>   ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
>   ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
>   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
>   ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
>   ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
>   ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
>   ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
>   ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
>   ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
>   ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
>   ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
>   ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
>   ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef 
> 
> Diff: https://reviews.apache.org/r/34757/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> chengxiang li
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message