hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai" <h...@cse.ohio-state.edu>
Subject Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization
Date Thu, 10 Jan 2013 23:21:10 GMT


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java,
line 32
> > <https://reviews.apache.org/r/7126/diff/11/?file=221824#file221824line32>
> >
> >     I don't see any modifications to ql/if/queryplan.thrift Please mod that file
appropriately and add the generated code  in the patch

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java, line 36
> > <https://reviews.apache.org/r/7126/diff/11/?file=221825#file221825line36>
> >
> >     Unused import.

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java, line 47
> > <https://reviews.apache.org/r/7126/diff/11/?file=221825#file221825line47>
> >
> >     Please add javadocs, explaining this class as well as the fact that there are
currently two classes which extends this. Also, add difference in behavior of those two classes
which necessitates the need for this base class.

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java, line
25
> > <https://reviews.apache.org/r/7126/diff/11/?file=221826#file221826line25>
> >
> >     Unused import.

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java, line
26
> > <https://reviews.apache.org/r/7126/diff/11/?file=221826#file221826line26>
> >
> >     Unused import.

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java, line
155
> > <https://reviews.apache.org/r/7126/diff/11/?file=221826#file221826line155>
> >
> >     please do return "CorrelationComposite" here instead of "CCO"

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
line 224
> > <https://reviews.apache.org/r/7126/diff/11/?file=221827#file221827line224>
> >
> >     It seems like you are serializing and then immediately deserializing keys and
values here, which I think is required for ReduceSinkOperator since keys and values are transferred
from mapper process to reducer process. This is redundant in CLSReduceSinkOp since its all
running inline in one operator pipeline in same memory. So, its looks like this could be avoided.

> >     I guess doing this keeps implementation easier, but if this is true, we should
take this up in follow-up jira as performance improvement.

yes, it was for easier implementation. I will add a comment indicating it will be addressed
in a follow-up jira.


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
line 275
> > <https://reviews.apache.org/r/7126/diff/11/?file=221827#file221827line275>
> >
> >     Does this imply CLSRSOperator cannot have more than one child operator in any
case. If so, can you please add comments stating that along with small description why is
that?

I thought that, in the original plan, a ReduceSinkOperator can only have 1 child. Because
a CLSRSOperator replaces a ReduceSinkOperator, it also has a single child. Is my understanding
correct?


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
line 285
> > <https://reviews.apache.org/r/7126/diff/11/?file=221827#file221827line285>
> >
> >     Please add comments here about why its overridden for empty implementation and
how startGroup() is taken care of in processOp()

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
line 290
> > <https://reviews.apache.org/r/7126/diff/11/?file=221827#file221827line290>
> >
> >     Please add comments here about why its overridden for empty implementation and
how endGroup() is dealt with in processOp()

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
line 294
> > <https://reviews.apache.org/r/7126/diff/11/?file=221827#file221827line294>
> >
> >     Please add comments here about why its overridden for empty implementation and
how this is taken care of in processOp()

added a comment to explain this method.


- Yin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/#review15124
-----------------------------------------------------------


On Nov. 19, 2012, 7:51 p.m., Yin Huai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7126/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2012, 7:51 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> This optimizer exploits intra-query correlations and merges multiple correlated MapReduce
jobs into one jobs. Open a new request since I have been working on hive-git.
> 
> 
> This addresses bug HIVE-2206.
>     https://issues.apache.org/jira/browse/HIVE-2206
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9fa9525 
>   conf/hive-default.xml.template f332f3a 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
7c4c413 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 18a9bd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 46daeb2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 68302f8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java edde378 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java d1555e2 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 2bf284d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 330aa52 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5a9f064 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java b33d616 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 6f8bc47 
>   ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
>   ql/src/test/results/compiler/plan/groupby1.q.xml cd0d6e4 
>   ql/src/test/results/compiler/plan/groupby2.q.xml 7b07f02 
>   ql/src/test/results/compiler/plan/groupby3.q.xml a6a1986 
>   ql/src/test/results/compiler/plan/groupby5.q.xml 25e3583 
> 
> Diff: https://reviews.apache.org/r/7126/diff/
> 
> 
> Testing
> -------
> 
> All tests pass.
> 
> 
> Thanks,
> 
> Yin Huai
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message