pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy" <rohini.adi...@gmail.com>
Subject Re: Review Request 25912: PIG-4162: Intermediate reducer parallelism in Tez should be higher
Date Wed, 24 Sep 2014 15:27:42 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25912/
-----------------------------------------------------------

(Updated Sept. 24, 2014, 3:27 p.m.)


Review request for pig, Cheolsoo Park and Daniel Dai.


Bugs: PIG-4162
    https://issues.apache.org/jira/browse/PIG-4162


Repository: pig


Description
-------

Following changes are done:
    - Always estimate intermediate reducer parallelism even if user has specified PARALLEL.
    - intermediate reducer parallelism = Min(2 * userparallelism, Math.max(userparallelism,
Math.max(estimatedparallelism, Math.max(2999,PigReducerEstimator.MAX_REDUCER_COUNT_PARAM)).
i.e Limiting estimated parallelism to be not more than 2x userparallelism or 2999. Hardcoding
2999 for now which is different from final reducer max parallelism default of 999 and is only
for intermediate reducers. Will make it configurable later if needed. 
    - ShuffleVertexManager.TEZ_SHUFFLE_VERTEX_MANAGER_DESIRED_TASK_INPUT_SIZE is set to blocksize
for intermediate tasks(same as mapper behaviour) instead of InputSizeReducerEstimator.DEFAULT_BYTES_PER_REDUCER
which defaults to 1G
     
   Patch has few other minor unrelated fixes as well.


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/Main.java 1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezResourceManager.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezOperator.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/ParallelismSetter.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/TezOperDependencyParallelismEstimator.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/TezParallelismEstimator.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezCompilerUtil.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/util/ParallelConstantVisitor.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/PigImplConstants.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/io/FileLocalizer.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/pigstats/tez/TezStats.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/tests/bigdata.conf 1626640 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestAlgebraicEval.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestForEachNestedPlan.java
1626640 
  http://svn.apache.org/repos/asf/pig/trunk/test/tez-tests 1626640 

Diff: https://reviews.apache.org/r/25912/diff/


Testing
-------

test-tez unit tests and e2e tests good.


Thanks,

Rohini Palaniswamy


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message