hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-318) Refactor reduce shuffle code
Date Tue, 22 Sep 2009 17:42:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758314#action_12758314
] 

Scott Carey commented on MAPREDUCE-318:
---------------------------------------

You may also want to note that this change improves performance significantly in some cases,
especially when there is a large number of small to medium sized map outputs  (many more outputs
to fetch per reduce than the number of TaskTrackers).  
In some of my jobs, shuffle times have dropped from 60% of the job time to < 5%. 

For a given job, shuffle time is FAR less sensitive to the number of maps and reducers than
it was before. 

> Refactor reduce shuffle code
> ----------------------------
>
>                 Key: MAPREDUCE-318
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-318
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5233_api.patch, HADOOP-5233_part0.patch, mapred-318-14Aug.patch,
mapred-318-20Aug.patch, mapred-318-24Aug.patch, mapred-318-3Sep-v1.patch, mapred-318-3Sep.patch,
mapred-318-common.patch
>
>
> The reduce shuffle code has become very complex and entangled. I think we should move
it out of ReduceTask and into a separate package (org.apache.hadoop.mapred.task.reduce). Details
to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message