hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: Review Request 47554: HIVE-13750
Date Thu, 19 May 2016 17:13:21 GMT


> On May 19, 2016, 12:06 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java,
line 556
> > <https://reviews.apache.org/r/47554/diff/1/?file=1387390#file1387390line556>
> >
> >     Can you also add comment why this is valid only for RS added by SPDO and not
in general?
> 
> Jesús Camacho Rodríguez wrote:
>     I was thinking further about this, and indeed I made many assumptions thinking that
SPDO will not add the sort columns at the end of the additional RS keys, so then we need to
make sure the ordering is correct... but this is not true, as it will add them.
>     
>     I think this optimization can be done general, as ordering before ordering again
is a noop. I have changed the code accordingly, I would like your opinion. Am I missing something?
Let's see what QA returns...
> 
> Ashutosh Chauhan wrote:
>     yeah.. thats what I think.. this is not specific to SPDO at logical layer. 
>     Thing I am not 100% sure about is in FileSink the way SPDO is implemented I *think*
it assumes all rows for a particular key come in sorted and in one batch. Now, when we merge
and sort by two keys and if there is a case that FS was expecting all rows sorted on second
key only, now will get them sorted on 2 keys and it may have to close ORC writer and then
reopen again, as oppose to write it once and never opening it again.
> 
> Jesús Camacho Rodríguez wrote:
>     I am trying to understand this part... The way the optimization works, we will keep
the second RS with its keys, we just remove the first RS... Thus, this should not be a problem?
Or maybe I am not understanding it correctly
> 
> Ashutosh Chauhan wrote:
>     I think you are correct. Since relevant FS is active in last reducer containing RS
which we is not changing, we are not changing any assumptions for FS.

On a second thought I think there is more to it. e.g. we are transforming RS (a) followed
by RS (a,b) to RS(a,b) Than this is not valid transformation in general, only if second RS
is introduced by SPDO. e.g, if these two RS are part of two GBYs (or a join & GBy), than
its invalid because partitioning on (a,b) != partitioning on (a)


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47554/#review133844
-----------------------------------------------------------


On May 19, 2016, 10:49 a.m., Jesús Camacho Rodríguez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47554/
> -----------------------------------------------------------
> 
> (Updated May 19, 2016, 10:49 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-13750
>     https://issues.apache.org/jira/browse/HIVE-13750
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-13750
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java 010c89ed978296709b052cc7bc80256a27658e2b

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
733620b84657a21829248afe72ab16ad9692f37e 
>   ql/src/test/results/clientpositive/dynpart_sort_opt_vectorization.q.out d03bfe422743d9a5a6b85f9a6198e1e27024f129

>   ql/src/test/results/clientpositive/dynpart_sort_optimization.q.out dec872ab0eef54bd92d5c2bc068e2805cc14e272

>   ql/src/test/results/clientpositive/dynpart_sort_optimization_acid.q.out 832580325873dee741ba86239ee571873994a808

>   ql/src/test/results/clientpositive/reducesink_dedup.q.out b89df52f965385b85894757896eee487b29c52ae

>   ql/src/test/results/clientpositive/tez/dynpart_sort_opt_vectorization.q.out a90e3f63b4646cf0ade9785a501ebd1a6b2a3406

>   ql/src/test/results/clientpositive/tez/dynpart_sort_optimization.q.out 723e8192f2735059005fc3c5c96732a2c4be49c1

> 
> Diff: https://reviews.apache.org/r/47554/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message