hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Na Yang" <ny...@maprtech.com>
Subject Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]
Date Sat, 06 Sep 2014 06:14:36 GMT


> On Sept. 5, 2014, 8:17 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, line 1747
> > <https://reviews.apache.org/r/25176/diff/4/?file=676862#file676862line1747>
> >
> >     The if condition already checks fileSinkDesc.isLinkedFileSink(), how come in
else block, we still need to do this? Isn't fileSinkDesc.isLinkedFileSink() consistent with
fileSinkDesc.getLinkedFileSinkDesc() != null? If not, we may want to make them consistent.

The situation is a bit tricky here. The isLinkedFileSink() is not consistent with fileSinkDesc.getLinkedFileSinkDesc()
!= null. In a special case, when the isLinkedFileSink()==false, the fileSinkDesc.getLinkedFileSinkDesc()
!= null. This happens when all the union all subqueries are map work. In this case, the generated
graph is map1->union, map2->union, map3->union. Although the isLinkedFileSink() returns
false, we still need to link the multiple filesinks together because they share the same destination
directory. Otherwise, for each filesink, a set of merge and move work will get generated and
their data overwrite each other.


> On Sept. 5, 2014, 8:17 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, line 1749
> > <https://reviews.apache.org/r/25176/diff/4/?file=676862#file676862line1749>
> >
> >     Also, how come we set parent dir and dir to the same location?

The same reason as the above one. In this case, the parent dir and the dir for each linkedfilesink
are the same.


> On Sept. 5, 2014, 8:17 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java, line 189
> > <https://reviews.apache.org/r/25176/diff/4/?file=676865#file676865line189>
> >
> >     Does it make sense to put this code block in private method?

I am fine to put this code block in a private method.


- Na


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25176/#review52498
-----------------------------------------------------------


On Sept. 4, 2014, 5:03 p.m., Na Yang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25176/
> -----------------------------------------------------------
> 
> (Updated Sept. 4, 2014, 5:03 p.m.)
> 
> 
> Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-7870
>     https://issues.apache.org/jira/browse/HIVE-7870
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]
> 
> The cause of this problem is during spark/tez task generation, the union file sink operator
are cloned to two new filesink operator. The linkedfilesinkdesc info for those new filesink
operators are missing. In addition, the two new filesink operators also need to be linked
together.   
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 5ddc16d

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 
>   ql/src/test/queries/clientpositive/union_remove_1.q c87b3fe 
>   ql/src/test/queries/clientpositive/union_remove_10.q 6701952 
>   ql/src/test/queries/clientpositive/union_remove_11.q 4b2fa42 
>   ql/src/test/queries/clientpositive/union_remove_12.q 69d0d0a 
>   ql/src/test/queries/clientpositive/union_remove_13.q 7605f0e 
>   ql/src/test/queries/clientpositive/union_remove_14.q a4fdfc8 
>   ql/src/test/queries/clientpositive/union_remove_15.q e3c937b 
>   ql/src/test/queries/clientpositive/union_remove_16.q 537078b 
>   ql/src/test/queries/clientpositive/union_remove_17.q d70f3d3 
>   ql/src/test/queries/clientpositive/union_remove_18.q 6352bc3 
>   ql/src/test/queries/clientpositive/union_remove_19.q 8c45953 
>   ql/src/test/queries/clientpositive/union_remove_2.q 83cd288 
>   ql/src/test/queries/clientpositive/union_remove_20.q f80f7c1 
>   ql/src/test/queries/clientpositive/union_remove_21.q 8963c25 
>   ql/src/test/queries/clientpositive/union_remove_22.q b0c1ccd 
>   ql/src/test/queries/clientpositive/union_remove_23.q a1b989a 
>   ql/src/test/queries/clientpositive/union_remove_24.q ec561e0 
>   ql/src/test/queries/clientpositive/union_remove_25.q 76c1ff5 
>   ql/src/test/queries/clientpositive/union_remove_3.q 9617f73 
>   ql/src/test/queries/clientpositive/union_remove_4.q cae323b 
>   ql/src/test/queries/clientpositive/union_remove_5.q 5df84e1 
>   ql/src/test/queries/clientpositive/union_remove_6.q bfce26d 
>   ql/src/test/queries/clientpositive/union_remove_7.q 3a95674 
>   ql/src/test/queries/clientpositive/union_remove_8.q a83a43e 
>   ql/src/test/queries/clientpositive/union_remove_9.q e71f6dd 
>   ql/src/test/results/clientpositive/spark/union10.q.out 20c681e 
>   ql/src/test/results/clientpositive/spark/union18.q.out 3f37a0a 
>   ql/src/test/results/clientpositive/spark/union19.q.out 6922fcd 
>   ql/src/test/results/clientpositive/spark/union28.q.out 8bd5218 
>   ql/src/test/results/clientpositive/spark/union29.q.out b9546ef 
>   ql/src/test/results/clientpositive/spark/union3.q.out 3ae6536 
>   ql/src/test/results/clientpositive/spark/union30.q.out 12717a1 
>   ql/src/test/results/clientpositive/spark/union33.q.out b89757f 
>   ql/src/test/results/clientpositive/spark/union4.q.out 6341cd9 
>   ql/src/test/results/clientpositive/spark/union6.q.out 263d9f4 
>   ql/src/test/results/clientpositive/spark/union_remove_10.q.out 927a15d 
>   ql/src/test/results/clientpositive/spark/union_remove_11.q.out 96651e1 
>   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 0954ae4 
>   ql/src/test/results/clientpositive/spark/union_remove_4.q.out cc46dda 
>   ql/src/test/results/clientpositive/spark/union_remove_5.q.out f6cdeb3 
>   ql/src/test/results/clientpositive/spark/union_remove_9.q.out 1f0260c 
> 
> Diff: https://reviews.apache.org/r/25176/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Na Yang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message