pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park" <piaozhe...@gmail.com>
Subject Re: Review Request 17191: [PIG-3626] Make combiners, custom partitioners and secondary key sort work for multiple outputs
Date Thu, 23 Jan 2014 00:09:23 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17191/#review32581
-----------------------------------------------------------

Ship it!


Looks good. I don't mind fixing NPE in Join 7,8 tests in a separate jira after committing
this patch first.


I have few questions below.



http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/SkewedPartitionerTez.java
<https://reviews.apache.org/r/17191/#comment61488>

    This is my mess and thank you for fixing it.
    
    The same fix needs to be applied to POPartitionRearrageTez too. In fact, I should have
refactor this initialization logic into a common function as in MR. I can take care of refactoring
in a separate jira.



http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
<https://reviews.apache.org/r/17191/#comment61484>

    This gets overwritten if tezOp.isSkewedJoin(). Why not moving it into an else block like
you did for selectOutputComparator()?



http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
<https://reviews.apache.org/r/17191/#comment61483>

    Do we need to set GROUP_COMPARATOR and SECONDARY_COMPARATOR when !tezOp.isUsedSecondaryKey()?


- Cheolsoo Park


On Jan. 22, 2014, 6:18 p.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17191/
> -----------------------------------------------------------
> 
> (Updated Jan. 22, 2014, 6:18 p.m.)
> 
> 
> Review request for pig, Alex Bain, Cheolsoo Park, Daniel Dai, and Mark Wagner.
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Added support for secondary key sort. Also fixed combiners and custom partitioners to
work for multiple outputs. Fixed bugs in couple of places as I encountered them.
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/branches/tez/shims/test/hadoop23/org/apache/pig/test/MiniCluster.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/shims/test/hadoop23/org/apache/pig/test/TezMiniCluster.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/PigConfiguration.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/ColumnInfo.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/SecondaryKeyOptimizerMR.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/optimizer/SecondaryKeyOptimizer.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLocalRearrange.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POReservoirSample.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/util/PlanHelper.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/SecondaryKeyOptimizerTez.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/SkewedPartitionerTez.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompilerUtil.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezEdgeDescriptor.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/util/SecondaryKeyOptimizerUtil.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/NullableTuple.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/PigNullableWritable.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/MiniGenericCluster.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/TestAccumulator.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/TestCombiner.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/TestCustomPartitioner.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/TestEvalPipeline2.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/TestPigServer.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/TestSecondarySort.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/TestSecondarySortMR.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/TestSkewedJoin.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/TestSplitStore.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/Util.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC14.gld
PRE-CREATION 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC15.gld
PRE-CREATION 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld
PRE-CREATION 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC4.gld
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC5.gld
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC8.gld
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestSecondarySortTez.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezCompiler.java
1559126 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/tez-tests 1559126 
> 
> Diff: https://reviews.apache.org/r/17191/diff/
> 
> 
> Testing
> -------
> 
> Enabled more unit tests. Memory is a problem with few tests. Also seeing cases of Tez
failing with file.out.index (map output) not found and hangs on waiting for input. Not sure
if it is with me upgrading latest tez trunk. But definitely seems to be a Tez bug. Will create
a TEZ jira once I dig a little more.
> 
> Currently skewed join e2e tests - Join_7 and Join_8 fail with a NPE with this patch.
Others pass. Will try and fix that shortly or in a different jira. Would like to get this
patch in quickly as it has been long pending and rebasing is a pain.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message