hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Sun" <chao....@cloudera.com>
Subject Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
Date Mon, 20 Oct 2014 22:04:51 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
-----------------------------------------------------------

(Updated Oct. 20, 2014, 10:04 p.m.)


Review request for hive and Xuefu Zhang.


Bugs: HIVE-8436
    https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
-------

Based on the design doc, we need to split the operator tree of a work in SparkWork if the
work is connected to multiple child works. The way splitting the operator tree is performed
by cloning the original work and removing unwanted branches in the operator tree. Please refer
to the design doc for details.
This process should be done right before we generate SparkPlan. We should have a utility method
that takes the orignal SparkWork and return a modified SparkWork.
This process should also keep the information about the original work and its clones. Such
information will be needed during SparkPlan generation (HIVE-8437).


Diffs (updated)
-----

  itests/src/test/resources/testconfiguration.properties 558dd02 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java c956101

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/queries/clientpositive/multi_insert_mixed.q PRE-CREATION 
  ql/src/test/results/clientpositive/multi_insert_mixed.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 310f2fe 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out e6054c9 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out d0f3e76 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out d40c7bb 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out b4ded62 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out d2529bb 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 7fa6130 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 4a4070b 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 62c179e 
  ql/src/test/results/clientpositive/spark/input12.q.out a4b7a3c 
  ql/src/test/results/clientpositive/spark/input13.q.out 5c799dc 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 1105ed8 
  ql/src/test/results/clientpositive/spark/input_part2.q.out 514f54a 
  ql/src/test/results/clientpositive/spark/insert1.q.out 1b88026 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 5b2aa78 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out cbf7204 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 3905d84 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out 0404119 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 903e966 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 730fb4f 
  ql/src/test/results/clientpositive/spark/multi_insert_mixed.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
1f31f56 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 4ded9d2 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 2b63321 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 16bfac1 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 05d719a 
  ql/src/test/results/clientpositive/spark/union18.q.out ce3e20c 
  ql/src/test/results/clientpositive/spark/union19.q.out ac28e36 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 1836150 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 179edd1 

Diff: https://reviews.apache.org/r/26706/diff/


Testing
-------

All multi-insertion related results are regenerated, and manually checked against the old
results.
Also I created a new test "spark_multi_insert_spill_work.q" to check splitting won't generate
duplicate FSs.


Thanks,

Chao Sun


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message