hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ZhuGuanyin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-222) Shuffle should be refactored to a separate task by itself
Date Tue, 01 Dec 2009 05:55:20 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784075#action_12784075

ZhuGuanyin commented on MAPREDUCE-222:

I think it would be better if shuffle and sort phase  seperate from reduce task.

1) The reschduled reduce need shuffle and sort again if the former reduce task failed in current
implentation. Example, the reduce shuffle and sort phase cost a lot of time if a reduce need
fetch map midoutput  from 100k maps.

2) we could shuffle and sort while anothers job's or tasks' reducer running, which would maximize
resource utilization. In current implentation, the reduce slots are comsumed if it is shuffle
or waiting the map finished.

3) we could localized the reduce task on the tasktracker where it has shuffled.

> Shuffle should be refactored to a separate task by itself
> ---------------------------------------------------------
>                 Key: MAPREDUCE-222
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-222
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Devaraj Das
> Currently, shuffle phase is part of the reduce task. The idea here is to move out the
shuffle as a first-class task. This will improve the usage of the network since we will then
be able to schedule shuffle tasks independently, and later on pin reduce tasks to those nodes.
This will make most sense for apps where there are multiple waves of reduces (the second wave
of reduces can directly start off doing the "reducer" phase).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message