hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai" <h...@cse.ohio-state.edu>
Subject Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization
Date Tue, 25 Sep 2012 14:18:27 GMT


> On Sept. 24, 2012, 9:52 p.m., Carl Steinbach wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java, line 33
> > <https://reviews.apache.org/r/7126/diff/4/?file=159451#file159451line33>
> >
> >     No raw types on LHS, and why is the classname fully qualified?

I have removed full qualified name. Those raw types in BaseReduceSinkDesc, CorrelationLocalSimulativeReduceSinkDesc,
and ReduceSinkDesc are from trunk and are used to expose method "clone" (List is not cloneable).
I have removed all LHS raw types related to my patch.


> On Sept. 24, 2012, 9:52 p.m., Carl Steinbach wrote:
> > ql/src/test/queries/clientpositive/correlationoptimizer4.q, line 23
> > <https://reviews.apache.org/r/7126/diff/4/?file=159462#file159462line23>
> >
> >     Combining these two queries with a UNION ALL would make it easier to visually
verify the results.

In those test cases for CorrelationOptimizer, any query will be executed twice. The optimizer
is disabled for the first run and is enabled for the second run. Results for these two runs
will be written to dest_co1 and dest_co2, respectively. Actually, what I want to do here is
to evaluate if dest_co1 and dest_co2 are same. Any good way to do that? Thanks.


- Yin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/#review11858
-----------------------------------------------------------


On Sept. 24, 2012, 3:53 p.m., Yin Huai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7126/
> -----------------------------------------------------------
> 
> (Updated Sept. 24, 2012, 3:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> This optimizer exploits intra-query correlations and merges multiple correlated MapReduce
jobs into one jobs. Open a new request since I have been working on hive-git.
> 
> 
> This addresses bug HIVE-2206.
>     https://issues.apache.org/jira/browse/HIVE-2206
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2693663 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 283d0b6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 8669051 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 5f08519 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1a40630 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 40dd949 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java f292131 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 33ce6ca 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5f38bf2 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 
>   ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
>   ql/src/test/results/compiler/plan/groupby1.q.xml 4382252 
>   ql/src/test/results/compiler/plan/groupby2.q.xml eef669c 
>   ql/src/test/results/compiler/plan/groupby3.q.xml 9743480 
>   ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860 
> 
> Diff: https://reviews.apache.org/r/7126/diff/
> 
> 
> Testing
> -------
> 
> Cannot test TestHBaseMinimrCliDriver, TestHBaseCliDriver, TestHBaseNegativeCliDriver,
testSynchronized in TestEmbeddedHiveMetaStore, testSynchronized in TestRemoteHiveMetaStore,
testSynchronized in TestSetUGIOnBothClientServer, testSynchronized in TestSetUGIOnOnlyClient,
testSynchronized in TestSetUGIOnOnlyServer, and testNegativeCliDriver_local_mapred_error_cache
in TestNegativeCliDriver, since trunk failed on these tests on my machine. Also, since trunk
will generate a different order of results (rows are in a different order) for queries skewjoinopt1.q
to skewjoinopt5.q, skewjoinopt10.q, skewjoinopt15.q to skewjoinopt17.q, and skewjoinopt19.q
to skewjoinopt20.q, I cannot test these queries on my machine either. All other tests pass.
> 
> 
> Thanks,
> 
> Yin Huai
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message