hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2568) Pin reduces with consecutive IDs to nodes and have a single shuffle task per job per node
Date Fri, 01 Feb 2008 11:09:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564724#action_12564724

Amar Kamat commented on HADOOP-2568:

Should we try doing this in steps as in
1) Try fetching all the map outputs for a reducer from one node in one shot.
2) Then extract the shuffler from each reducer and have a common shuffler for all the reducers
on that node. This is having 3 tasks {{mapper, shuffler, reducer}}, no?
Having a shuffler will be a big change in terms of code and design while combining/piggy-bagging
the map outputs for one reducer will be comparatively smaller.

> Pin reduces with consecutive IDs to nodes and have a single shuffle task per job per
> -----------------------------------------------------------------------------------------
>                 Key: HADOOP-2568
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2568
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.17.0
> The idea is to reduce disk seeks while fetching the map outputs. If we opportunistically
pin reduces with consecutive IDs (like 5, 6, 7 .. max-reduce-tasks on that node) on a node,
and have a single shuffle task, we should benefit, if for every fetch, that shuffle task fetches
all the outputs for the reduces it is shuffling for. In the case where we have 2 reduces per
node, we will decrease the #seeks in the map output files on the map nodes by 50%. Memory
usage by that shuffle task would be proportional to the number of reduces it is shuffling
for (to account for the number of ramfs instances, one per reduce). But overall it should
> Thoughts?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message