hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Sun" <chao....@cloudera.com>
Subject Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
Date Mon, 20 Oct 2014 21:59:53 GMT


> On Oct. 20, 2014, 9:52 p.m., Xuefu Zhang wrote:
> > itests/src/test/resources/testconfiguration.properties, line 509
> > <https://reviews.apache.org/r/26706/diff/7/?file=726397#file726397line509>
> >
> >     We might need to change this as well.

Can't believe I missed this. Sorry for the sloppyness!


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review57445
-----------------------------------------------------------


On Oct. 20, 2014, 9:10 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26706/
> -----------------------------------------------------------
> 
> (Updated Oct. 20, 2014, 9:10 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8436
>     https://issues.apache.org/jira/browse/HIVE-8436
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Based on the design doc, we need to split the operator tree of a work in SparkWork if
the work is connected to multiple child works. The way splitting the operator tree is performed
by cloning the original work and removing unwanted branches in the operator tree. Please refer
to the design doc for details.
> This process should be done right before we generate SparkPlan. We should have a utility
method that takes the orignal SparkWork and return a modified SparkWork.
> This process should also keep the information about the original work and its clones.
Such information will be needed during SparkPlan generation (HIVE-8437).
> 
> 
> Diffs
> -----
> 
>   itests/src/test/resources/testconfiguration.properties 558dd02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java c956101

>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
93940bc 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a

>   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
>   ql/src/test/queries/clientpositive/multi_insert_mixed.q PRE-CREATION 
>   ql/src/test/results/clientpositive/multi_insert_mixed.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 310f2fe 
>   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out e6054c9 
>   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out d0f3e76 
>   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out d40c7bb 
>   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out b4ded62

>   ql/src/test/results/clientpositive/spark/groupby_position.q.out d2529bb 
>   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 7fa6130 
>   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 4a4070b 
>   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 62c179e 
>   ql/src/test/results/clientpositive/spark/input12.q.out a4b7a3c 
>   ql/src/test/results/clientpositive/spark/input13.q.out 5c799dc 
>   ql/src/test/results/clientpositive/spark/input1_limit.q.out 1105ed8 
>   ql/src/test/results/clientpositive/spark/input_part2.q.out 514f54a 
>   ql/src/test/results/clientpositive/spark/insert1.q.out 1b88026 
>   ql/src/test/results/clientpositive/spark/insert_into3.q.out 5b2aa78 
>   ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out cbf7204 
>   ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 3905d84 
>   ql/src/test/results/clientpositive/spark/multi_insert.q.out 0404119 
>   ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 903e966 
>   ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 730fb4f 
>   ql/src/test/results/clientpositive/spark/multi_insert_mixed.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
1f31f56 
>   ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 4ded9d2 
>   ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 2b63321 
>   ql/src/test/results/clientpositive/spark/ppd_transform.q.out 16bfac1 
>   ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 05d719a 
>   ql/src/test/results/clientpositive/spark/union18.q.out ce3e20c 
>   ql/src/test/results/clientpositive/spark/union19.q.out ac28e36 
>   ql/src/test/results/clientpositive/spark/union_remove_6.q.out 1836150 
>   ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 179edd1 
> 
> Diff: https://reviews.apache.org/r/26706/diff/
> 
> 
> Testing
> -------
> 
> All multi-insertion related results are regenerated, and manually checked against the
old results.
> Also I created a new test "spark_multi_insert_spill_work.q" to check splitting won't
generate duplicate FSs.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message