hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang" <xzh...@cloudera.com>
Subject Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
Date Wed, 15 Oct 2014 03:00:14 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review56640
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
<https://reviews.apache.org/r/26706/#comment97030>

    Can we try a generic method so that we only have one method doing cloning for both?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java
<https://reviews.apache.org/r/26706/#comment97031>

    I think input param can be just BytesWritable.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
<https://reviews.apache.org/r/26706/#comment97033>

    I think we should use add() instead.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
<https://reviews.apache.org/r/26706/#comment97035>

    The design doc explicitly specifies that the first clone is handled differently than the
rest, but I didn't see such handling here. We may have problem with this implementation.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java
<https://reviews.apache.org/r/26706/#comment97036>

    Let's not to use * in imports.


- Xuefu Zhang


On Oct. 14, 2014, 9:17 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26706/
> -----------------------------------------------------------
> 
> (Updated Oct. 14, 2014, 9:17 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8436
>     https://issues.apache.org/jira/browse/HIVE-8436
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Based on the design doc, we need to split the operator tree of a work in SparkWork if
the work is connected to multiple child works. The way splitting the operator tree is performed
by cloning the original work and removing unwanted branches in the operator tree. Please refer
to the design doc for details.
> This process should be done right before we generate SparkPlan. We should have a utility
method that takes the orignal SparkWork and return a modified SparkWork.
> This process should also keep the information about the original work and its clones.
Such information will be needed during SparkPlan generation (HIVE-8437).
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
93940bc 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a

>   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
>   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 95d7b59 
>   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out b425c67 
>   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out dc713b3 
>   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out cd8e85e 
>   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 801ac8a

>   ql/src/test/results/clientpositive/spark/groupby_position.q.out b04e55c 
>   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4bde6ea 
>   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out ab2fe84 
>   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 5c1cbc4 
>   ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
>   ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
>   ql/src/test/results/clientpositive/spark/input1_limit.q.out 90bc8ea 
>   ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
>   ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
>   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7964802 
>   ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
>   ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
>   ql/src/test/results/clientpositive/spark/multi_insert.q.out 31ebbeb 
>   ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 0a983d8 
>   ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 68b1312 
>   ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
f7867ac 
>   ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out dbb78a6 
>   ql/src/test/results/clientpositive/spark/orc_analyze.q.out a0af7ba 
>   ql/src/test/results/clientpositive/spark/parallel.q.out acd418f 
>   ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 169d2f1 
>   ql/src/test/results/clientpositive/spark/ppd_transform.q.out 54b8a8a 
>   ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 6f8066d 
>   ql/src/test/results/clientpositive/spark/union18.q.out 07ea2c5 
>   ql/src/test/results/clientpositive/spark/union19.q.out 2fefe8e 
>   ql/src/test/results/clientpositive/spark/union_remove_6.q.out 147f1fe 
>   ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out e12943c 
> 
> Diff: https://reviews.apache.org/r/26706/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Chao Sun
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message