pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Quang-Nhat HOANG-XUAN" <hxquangn...@gmail.com>
Subject Re: Review Request 23804: PIG-4066 An optimization for ROLLUP operation on Pig
Date Fri, 25 Jul 2014 13:00:47 GMT


> On July 24, 2014, 7:46 p.m., Cheolsoo Park wrote:
> > Quang-Nhat HOANG-XUAN, thank you very much for the patch.
> > 
> > I briefly went it over and found some high-level issues to be address-
> > 1) Please rebase your patch to trunk. New features are only committed into trunk.
> > 2) Please avoid using hard-coded strings. Change them to static variables, etc.
> > 3) Please use primitives instead of objects where possible. JobConf has set/get[Type]
methods, so no need to convert primitives to objects and call toString().
> > 4) Please avoid adding unnecessary object fields and instead use local variables.
> > 5) Please make PORollupH2IRGForEach a subclass of POForEach. This seems to save
quite a few code changes.
> > 6) Please use camel case for variable/field/method names and avoid underscores.
> > 7) Please remove "this" when unnecessary. e.g. this.rollup_position => rollup_position.
> > 8) Please do not use tabs for indentation but use 4 whitespaces. (You can set up
IDE for this.)
> > 9) Please delete all the trailing whitespaces. (You can set up IDE for this.)
> > 
> > I stopped reviewing it for now. Once you clean up your patch a bit, I will resume
my review and try to run tests.
> > 
> > Thanks!

Thank you so much!
I will fix them.


- Quang-Nhat


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23804/#review48543
-----------------------------------------------------------


On July 22, 2014, 9:20 a.m., Quang-Nhat HOANG-XUAN wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23804/
> -----------------------------------------------------------
> 
> (Updated July 22, 2014, 9:20 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-4066
>     https://issues.apache.org/jira/browse/PIG-4066
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> This patch aims at addressing the current limitation of the ROLLUP operator in PIG: most
of the work is done in the Map phase of the underlying MapReduce job to generate all possible
intermediate keys that the reducer use to aggregate and produce the ROLLUP output. Based on
our previous work: “Duy-Hung Phan, Matteo Dell’Amico, Pietro Michiardi: On the design
space of MapReduce ROLLUP aggregates” (http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf),
we show that the design space for a ROLLUP implementation allows for a different approach
(in-reducer grouping, IRG), in which less work is done in the Map phase and the grouping is
done in the Reduce phase. This patch presents the most efficient implementation we designed
(Hybrid IRG), which allows defining a parameter to balance between parallelism (in the reducers)
and communication cost.
> This patch contains the following features:
> 1. The new ROLLUP approach: IRG, Hybrid IRG.
> 2. The PIVOT clause in CUBE operators.
> 3. Test cases.
> The new syntax to use our ROLLUP approach:
> alias = CUBE rel BY
> { CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]} [, { CUBE col_ref | ROLLUP col_ref
[PIVOT pivot_value]}
> ...]
> In case there is multiple ROLLUP operator in one CUBE clause, the last ROLLUP operator
will be executed with our approach (IRG, Hybrid IRG) while the remaining ROLLUP ahead will
be executed with the default approach.
> We have already made some experiments for comparison between our ROLLUP implementation
and the current ROLLUP. More information can be found at here: http://hxquangnhat.github.io/PIG-ROLLUP-H2IRG/
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1579421

>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java
1579421 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
1579421 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
1579421 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MultiQueryOptimizer.java
1579421 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java
1579421 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java
1579421 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java
PRE-CREATION 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
1579421 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java
1579421 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
1579421 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java
1579421 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORollupH2IRGForEach.java
PRE-CREATION 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/util/PlanHelper.java
1579421 
>   trunk/src/org/apache/pig/builtin/RollupDimensions.java 1579421 
>   trunk/src/org/apache/pig/newplan/logical/DotLOPrinter.java 1579421 
>   trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java
1579421 
>   trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1579421

>   trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanOptimizer.java 1579421

>   trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java 1579421

>   trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java 1579421 
>   trunk/src/org/apache/pig/newplan/logical/optimizer/UidResetter.java 1579421 
>   trunk/src/org/apache/pig/newplan/logical/relational/LOCogroup.java 1579421 
>   trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java 1579421 
>   trunk/src/org/apache/pig/newplan/logical/relational/LORollupH2IRGForEach.java PRE-CREATION

>   trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java
1579421 
>   trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java
1579421 
>   trunk/src/org/apache/pig/newplan/logical/rules/OptimizerUtils.java 1579421 
>   trunk/src/org/apache/pig/newplan/logical/rules/RollupH2IRGOptimizer.java PRE-CREATION

>   trunk/src/org/apache/pig/parser/AliasMasker.g 1579421 
>   trunk/src/org/apache/pig/parser/AstPrinter.g 1579421 
>   trunk/src/org/apache/pig/parser/AstValidator.g 1579421 
>   trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1579421 
>   trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1579421 
>   trunk/src/org/apache/pig/parser/QueryLexer.g 1579421 
>   trunk/src/org/apache/pig/parser/QueryParser.g 1579421 
>   trunk/test/org/apache/pig/test/TestCubeOperator.java 1579421 
>   trunk/test/org/apache/pig/test/TestNewPlanLogToPhyTranslationVisitor.java 1579421 
>   trunk/test/org/apache/pig/test/utils/GenPhyOp.java 1579421 
> 
> Diff: https://reviews.apache.org/r/23804/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Quang-Nhat HOANG-XUAN
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message