hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5223) Refactor reduce shuffle code
Date Wed, 11 Feb 2009 18:40:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672713#action_12672713

Owen O'Malley commented on HADOOP-5223:

Roughly, I think the flow should look like:

EventFetcher -> HostPlanner -> FetcherPool -> OutputMerger

There is also a main shuffle object that tracks the progress of the shuffle. Each of these
should be a separate class. The EventFetcher gets the map completion events from the TaskTracker.
The HostPlanner will keep track of available map outputs, penalty box, and hands out hosts
that are ready to the fetchers. The FetcherPool is pool of threads that are doing the actual
copy of data. The OutputMerger manages the in memory and on disk data and has a thread to
do merges.

We'll post a patch with the api soon.

> Refactor reduce shuffle code
> ----------------------------
>                 Key: HADOOP-5223
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5223
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.21.0
> The reduce shuffle code has become very complex and entangled. I think we should move
it out of ReduceTask and into a separate package (org.apache.hadoop.mapred.task.reduce). Details
to follow.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message