hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Iyappan Srinivasan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-318) Refactor reduce shuffle code
Date Thu, 03 Sep 2009 10:07:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750951#action_12750951
] 

Iyappan Srinivasan commented on MAPREDUCE-318:
----------------------------------------------

+1 for testing

Cluster conf with mapred.child.java.opts 512M and io.sort.factor 100.
namenode heap size is 3GB and jobtracker heap size is 1GB.

Some benchmarking and functionality test results.

1) 
default sort on a a 94 node cluster :
trunk two attempts : 1)2376 seconds 2) 2589 seconds
with patch last two attempts : 1) 1408 seconds 2) 1381 seconds


2)
loadgen ona  94 node cluster:
trunk : 57 minutes
with patch two attempts : 1)56 minuts 9 seconds 2) 56 minuts 23 seconds.

3)
gridmix2 on a  491 node cluster : 
trunk : 1 hour 7 minutes
patch  two attempts : 1) 57 minutes, 2) 47 minutes.

4) sort ( with memory-to-memory enabled) : passed

5) After starting the job with mapred.reduce.slowstart.completed.maps=1, remove some intermediate
map output and corrupt some map output. verify if only those tasks are rerun.


 

> Refactor reduce shuffle code
> ----------------------------
>
>                 Key: MAPREDUCE-318
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-318
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HADOOP-5233_api.patch, HADOOP-5233_part0.patch, mapred-318-14Aug.patch,
mapred-318-20Aug.patch, mapred-318-24Aug.patch, mapred-318-3Sep-v1.patch, mapred-318-3Sep.patch,
mapred-318-common.patch
>
>
> The reduce shuffle code has become very complex and entangled. I think we should move
it out of ReduceTask and into a separate package (org.apache.hadoop.mapred.task.reduce). Details
to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message