hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Steinbach" <c...@cloudera.com>
Subject Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization
Date Mon, 24 Sep 2012 21:52:45 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/#review11858
-----------------------------------------------------------



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
<https://reviews.apache.org/r/7126/#comment25375>

    Need to add this to conf/hive-default.xml.template along with a description.



ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java
<https://reviews.apache.org/r/7126/#comment25376>

    Every instance variable should be explicitly public, private, or protected. Please add
"protected" where necessary. Also, please maintain consistent order, e.g. "protected transient
.." instead of "transient protected ..."



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
<https://reviews.apache.org/r/7126/#comment25377>

    Formatting: this line needs to be indented.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
<https://reviews.apache.org/r/7126/#comment25378>

    switch "CorrelationCompositeOperator" with "Driver".



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
<https://reviews.apache.org/r/7126/#comment25379>

    s/firstRow/isFirstRow/



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
<https://reviews.apache.org/r/7126/#comment25380>

    Please don't use raw types on the left hand side or in parameter lists, e.g:
    
    List<Integer> operationPathTags = new ArrayList<Integer>();
    
    instead of
    
    ArrayList<Integer> operationPathTags = new ArrayList<Integer>();
    



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
<https://reviews.apache.org/r/7126/#comment25381>

    Should this be ignored? And if so, I think it should be logged instead of dumping the
stack to stderr.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
<https://reviews.apache.org/r/7126/#comment25383>

    This needs to return something other than null.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
<https://reviews.apache.org/r/7126/#comment25385>

    Formatting: indent



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
<https://reviews.apache.org/r/7126/#comment25386>

    Raw type on LHS.



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
<https://reviews.apache.org/r/7126/#comment25388>

    Log this?



ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
<https://reviews.apache.org/r/7126/#comment25389>

    This is really hard to read. Please use temporary variables instead of repeatedly calling
the same getters.



ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
<https://reviews.apache.org/r/7126/#comment25390>

    This comment doesn't add much. Please remove.



ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
<https://reviews.apache.org/r/7126/#comment25391>

    Rawtype on the LHS. Please fix all occurrences of this problem in the code that you have
added/modified.



ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
<https://reviews.apache.org/r/7126/#comment25392>

    Methods should not return rawtypes. Please return List<Integer> instead. Please
correct this issue in any other code that is modified/added in this patch.



ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
<https://reviews.apache.org/r/7126/#comment25393>

    Is it really necessary to make this public?
    



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java
<https://reviews.apache.org/r/7126/#comment25394>

    Please remove the HTML javadoc formatting. Most of the folks are are going to read this
will look at the actual source instead of the javadoc, and for them the HTML tags are a distraction.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java
<https://reviews.apache.org/r/7126/#comment25395>

    Formatting: variable names (with the exception of public static final variables) begin
with lowercase lettters, e.g. aliasToTabName instead of AliastoTabName.



ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java
<https://reviews.apache.org/r/7126/#comment25396>

    No raw types on LHS, and why is the classname fully qualified?



ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
<https://reviews.apache.org/r/7126/#comment25397>

    Please don't use raw types in parameter lists (and why is ArrayList fully qualified?)



ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
<https://reviews.apache.org/r/7126/#comment25398>

    Too many input parameters. Please replace this with a Builder or a no-arg constructor
and setters.



ql/src/test/queries/clientpositive/correlationoptimizer4.q
<https://reviews.apache.org/r/7126/#comment25399>

    Combining these two queries with a UNION ALL would make it easier to visually verify the
results.


- Carl Steinbach


On Sept. 24, 2012, 3:53 p.m., Yin Huai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7126/
> -----------------------------------------------------------
> 
> (Updated Sept. 24, 2012, 3:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> This optimizer exploits intra-query correlations and merges multiple correlated MapReduce
jobs into one jobs. Open a new request since I have been working on hive-git.
> 
> 
> This addresses bug HIVE-2206.
>     https://issues.apache.org/jira/browse/HIVE-2206
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2693663 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 283d0b6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 8669051 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 5f08519 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1a40630 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 40dd949 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java f292131 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 33ce6ca 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5f38bf2 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
>   ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
>   ql/src/test/results/compiler/plan/groupby1.q.xml 4382252 
>   ql/src/test/results/compiler/plan/groupby2.q.xml eef669c 
>   ql/src/test/results/compiler/plan/groupby3.q.xml 9743480 
>   ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860 
> 
> Diff: https://reviews.apache.org/r/7126/diff/
> 
> 
> Testing
> -------
> 
> Cannot test TestHBaseMinimrCliDriver, TestHBaseCliDriver, TestHBaseNegativeCliDriver,
testSynchronized in TestEmbeddedHiveMetaStore, testSynchronized in TestRemoteHiveMetaStore,
testSynchronized in TestSetUGIOnBothClientServer, testSynchronized in TestSetUGIOnOnlyClient,
testSynchronized in TestSetUGIOnOnlyServer, and testNegativeCliDriver_local_mapred_error_cache
in TestNegativeCliDriver, since trunk failed on these tests on my machine. Also, since trunk
will generate a different order of results (rows are in a different order) for queries skewjoinopt1.q
to skewjoinopt5.q, skewjoinopt10.q, skewjoinopt15.q to skewjoinopt17.q, and skewjoinopt19.q
to skewjoinopt20.q, I cannot test these queries on my machine either. All other tests pass.
> 
> 
> Thanks,
> 
> Yin Huai
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message