hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan" <hashut...@apache.org>
Subject Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization
Date Thu, 10 Jan 2013 02:24:04 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/#review15124
-----------------------------------------------------------



ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
<https://reviews.apache.org/r/7126/#comment32714>

    I don't see any modifications to ql/if/queryplan.thrift Please mod that file appropriately
and add the generated code  in the patch



ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java
<https://reviews.apache.org/r/7126/#comment32715>

    Unused import.



ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java
<https://reviews.apache.org/r/7126/#comment32716>

    Please add javadocs, explaining this class as well as the fact that there are currently
two classes which extends this. Also, add difference in behavior of those two classes which
necessitates the need for this base class.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
<https://reviews.apache.org/r/7126/#comment32811>

    Unused import.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
<https://reviews.apache.org/r/7126/#comment32812>

    Unused import.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
<https://reviews.apache.org/r/7126/#comment32813>

    please do return "CorrelationComposite" here instead of "CCO"



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
<https://reviews.apache.org/r/7126/#comment32722>

    It seems like you are serializing and then immediately deserializing keys and values here,
which I think is required for ReduceSinkOperator since keys and values are transferred from
mapper process to reducer process. This is redundant in CLSReduceSinkOp since its all running
inline in one operator pipeline in same memory. So, its looks like this could be avoided.

    I guess doing this keeps implementation easier, but if this is true, we should take this
up in follow-up jira as performance improvement.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
<https://reviews.apache.org/r/7126/#comment32810>

    Does this imply CLSRSOperator cannot have more than one child operator in any case. If
so, can you please add comments stating that along with small description why is that?



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
<https://reviews.apache.org/r/7126/#comment32718>

    Please add comments here about why its overridden for empty implementation and how startGroup()
is taken care of in processOp()



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
<https://reviews.apache.org/r/7126/#comment32719>

    Please add comments here about why its overridden for empty implementation and how endGroup()
is dealt with in processOp()



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
<https://reviews.apache.org/r/7126/#comment32720>

    Please add comments here about why its overridden for empty implementation and how this
is taken care of in processOp()



ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
<https://reviews.apache.org/r/7126/#comment32814>

    I don't get the reason of introducing rowNumber in Operator. It doesn't look its required
for optimization to take place correctly. Can you elaborate the need of this variable and
different associated methods which are introduced along with it?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java
<https://reviews.apache.org/r/7126/#comment32815>

    Can you add comments clarifying whats the reason for this? 



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java
<https://reviews.apache.org/r/7126/#comment32816>

    Seems like the logic of auto conversion of join to map-join is same as the one used in
CommonJoinResolver. If thats the case, instead of replicating the logic here, it will be good
to refactor that logic out of CommonJoinResolver in some util function in some util class
and then use that function here. That logic will likely change over course of time and risk
getting diverged if we duplicate instead of reuse. 



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java
<https://reviews.apache.org/r/7126/#comment32817>

    If I am reading this correctly than, map-side groupbys are converted to reduce-side groupbys.
I don't see any fundamental reason why correlation optimizer cannot work with map-side groupbys.
Is this just the limitation of current implementation or is there any fundamental reason for
it. Please add comments whatever the case is.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java
<https://reviews.apache.org/r/7126/#comment32828>

    Instead of changing the plan first and than undo the changes on the plan later on (in
case optimizer can not  be applied) I am wondering a safer choice could have been to clone
the plan and than do your changes on the cloned plan. If we find that plan can be optimized,
only than change the original plan. This will be a bit more safer. Thoughts?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
<https://reviews.apache.org/r/7126/#comment32837>

    Can you add comments why it is not compatible with skew-join and groupby-skew optimizations?



ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java
<https://reviews.apache.org/r/7126/#comment32838>

    Explain annotation should have worked, no ? Having this working will be very useful for
debugging.


- Ashutosh Chauhan


On Nov. 19, 2012, 7:51 p.m., Yin Huai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7126/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2012, 7:51 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> This optimizer exploits intra-query correlations and merges multiple correlated MapReduce
jobs into one jobs. Open a new request since I have been working on hive-git.
> 
> 
> This addresses bug HIVE-2206.
>     https://issues.apache.org/jira/browse/HIVE-2206
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9fa9525 
>   conf/hive-default.xml.template f332f3a 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
7c4c413 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 18a9bd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 46daeb2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 68302f8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java edde378 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java d1555e2 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 2bf284d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 330aa52 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5a9f064 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java b33d616 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 6f8bc47 
>   ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
>   ql/src/test/results/compiler/plan/groupby1.q.xml cd0d6e4 
>   ql/src/test/results/compiler/plan/groupby2.q.xml 7b07f02 
>   ql/src/test/results/compiler/plan/groupby3.q.xml a6a1986 
>   ql/src/test/results/compiler/plan/groupby5.q.xml 25e3583 
> 
> Diff: https://reviews.apache.org/r/7126/diff/
> 
> 
> Testing
> -------
> 
> All tests pass.
> 
> 
> Thanks,
> 
> Yin Huai
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message