hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai" <h...@cse.ohio-state.edu>
Subject Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization
Date Fri, 10 Feb 2012 19:11:38 GMT


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > I've started reviewing this, here's my comments so far.  I'll continue to look over
it.

I will update this patch soon. 


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 453
> > <https://reviews.apache.org/r/2001/diff/4/?file=71297#file71297line453>
> >
> >     Does this have to default to false, does anything break if it's true?
> >     
> >     Similarly, have you tried running the tests with this set to true?

I have not tried running the tests with this set to true. I will do it when I find a revision
which can pass all unit tests (btw, any suggestion on which revision should I use?). In my
opinion, since this optimizer is kind of complicated and it is still being developed, it will
be safer to default it to false and let users to decide when to use it than default it to
true.


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java,
line 101
> > <https://reviews.apache.org/r/2001/diff/4/?file=71299#file71299line101>
> >
> >     It's not clear to me why we need both setRowNumber and processOp.

Since a CorrelationCompositeOperator may have multiple parents, I used a buffer to store the
output of parents of the CorrelationCompositeOperator (shown processOp method). The TableScanOperator
will trigger the setRowNumber method and then CorrelationCompositeOperator will decide the
operationPathTags of this row based on the contents in the buffer and then forward the row
in its buffer to its child. So, setRowNumber in here is used to evaluate the operationPathTags
of the row in the buffer before the CorrelationCompositeOperator gets the new row. 


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java,
lines 150-177
> > <https://reviews.apache.org/r/2001/diff/4/?file=71299#file71299line150>
> >
> >     Putting this code in a helper method would be better than having it both here
and in setRowNumber.

I will do it. 


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
line 274
> > <https://reviews.apache.org/r/2001/diff/4/?file=71300#file71300line274>
> >
> >     Does this commented out code need to be kept?

This commented out code is not needed. I will delete it. 


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java, line 1337
> > <https://reviews.apache.org/r/2001/diff/4/?file=71303#file71303line1337>
> >
> >     I couldn't find a CorrelationFakeReduceSinkOperator class.

CorrelationLocalSimulativeReduceSinkOperator was named as CorrelationFakeReduceSinkOperator.
I will use the right name in the comment. 


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java, line 273
> > <https://reviews.apache.org/r/2001/diff/4/?file=71305#file71305line273>
> >
> >     Tabs are bad, could you change them to spaces, at least in the new code your
introducing.

I will change the format of my code. Thanks for letting me know.


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java,
line 239
> > <https://reviews.apache.org/r/2001/diff/4/?file=71308#file71308line239>
> >
> >     I take it from this line it's a requirement that in order for this correlation
optimization to be attempted every reduce sink has to be followed only by children with a
single child.
> >     
> >     Could this be relaxed?  Could the optimization simply not be applied if there
is an operator between two ReduceSinks that has more than one child?
> >     
> >     Also, if there is a ReduceSink which is not followed by another ReduceSink,
but is followed by an operator with more than one child, this prevents the optimization from
being used, even though it shouldn't have an effect.
> >     
> >     Also, regarding checking if the size <=1, if the size <1 the next line
will throw an exception.

Only "assert op.getChildOperators().size() > 0;" is needed at here. Thank you for letting
me know. 


> On 2012-02-10 17:38:09, Kevin Wilfong wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java,
line 335
> > <https://reviews.apache.org/r/2001/diff/4/?file=71308#file71308line335>
> >
> >     findNextChildReduceSinkOperator can return null, do you need to check for this?

findNextChildReduceSinkOperator will not return null since its input will not be the last
ReduceSinkOperator before the FileSinkOperator. For example, suppose that we have a plan tree
like (some operators)->RS1->(some operators)->RS2->(some operators)->FS. The
input of findNextChildReduceSinkOperator will not be RS2. I will add an assertion and a comment
after this line. 


- Yin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2001/#review4912
-----------------------------------------------------------


On 2012-01-29 17:56:48, Yin Huai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2001/
> -----------------------------------------------------------
> 
> (Updated 2012-01-29 17:56:48)
> 
> 
> Review request for hive.
> 
> 
> Summary
> -------
> 
> This optimizer exploits intra-query correlations and merges multiple correlated MapReduce
jobs into one jobs.
> 
> 
> This addresses bug HIVE-2206.
>     https://issues.apache.org/jira/browse/HIVE-2206
> 
> 
> Diffs
> -----
> 
>   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java PRE-CREATION

>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
PRE-CREATION 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
PRE-CREATION 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
PRE-CREATION 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java PRE-CREATION

>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java
PRE-CREATION 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java PRE-CREATION

>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java PRE-CREATION

>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
PRE-CREATION 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java
PRE-CREATION 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 1237326 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 1237326 
>   trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 1237326 
>   trunk/ql/src/test/results/compiler/plan/groupby1.q.xml 1237326 
>   trunk/ql/src/test/results/compiler/plan/groupby2.q.xml 1237326 
>   trunk/ql/src/test/results/compiler/plan/groupby3.q.xml 1237326 
>   trunk/ql/src/test/results/compiler/plan/groupby5.q.xml 1237326 
> 
> Diff: https://reviews.apache.org/r/2001/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Yin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message