pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai" <dai...@gmail.com>
Subject Re: Review Request 23787: Group All followed by CROSS with default parallelism produces wrong results
Date Thu, 24 Jul 2014 00:45:27 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23787/
-----------------------------------------------------------

(Updated July 24, 2014, 12:45 a.m.)


Review request for pig.


Bugs: PIG-4057
    https://issues.apache.org/jira/browse/PIG-4057


Repository: pig


Description
-------

Summary of changes:
1. Take tez parallelism estimation out from TezDagBuilder to ParallelismSetter, so we can
get estimated parallelism of the cross before we creating vertex of GFCross
2. Take InputSplit generate out from TezDagBuilder to LoaderProcessor, since we need to know
the parallelism of maps before ParallelismSetter
3. set pig.cross.parallelism.hint.(operator_key) in conf
    * In tez, this is done when we encounter cross vertex
    * In MR, this is done when we encounter the first GFCross
4. GFCross will use pig.cross.parallelism.hint.(operator_key) to determine the #partition


Diffs (updated)
-----

  trunk/src/org/apache/pig/PigConfiguration.java 1612189 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
1612189 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POGlobalRearrange.java
1612189 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 1612189 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 1612189 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java 1612189 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java 1612189 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/optimizers/LoaderProcessor.java
PRE-CREATION 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/optimizers/ParallelismSetter.java
PRE-CREATION 
  trunk/src/org/apache/pig/impl/builtin/GFCross.java 1612189 
  trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1612189

  trunk/test/e2e/pig/tests/nightly.conf 1612189 
  trunk/test/org/apache/pig/test/TestGFCross.java 1612189 

Diff: https://reviews.apache.org/r/23787/diff/


Testing
-------


Thanks,

Daniel Dai


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message