hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Podgursky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-199) Locality hints for Reduce
Date Mon, 21 Oct 2013 13:37:49 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800644#comment-13800644

Ben Podgursky commented on MAPREDUCE-199:

Hey Harsh.  Delay was because I've only worked with MR1 so far (cloudera hadoop 4) and all
of my source suggestions were in the context of MR1, so I spent a bit of time checking out
what in the source changed between MR1 and MR2.   

After looking around your patch seems like a pretty nice way of enabling this functionality
without baking anything else into the API or complicating the code (since it bootstraps on
locality logic which already exists.)  

The other alternative I was thinking about was making the logic pluggable via the JobConf,
similar to how partitions are set, eg


Where MyLocalityLogic would have logic for assigning task -> host.  I'm not really sure
how it would work though since (1) I'm not sure whether user-code is on the classpath at the
time tasks are assigned to nodes and (2) the locality logic would need to be presented with
a whole network topology to be able to do anything intelligent, and I'm not sure where that
would come from...

> Locality hints for Reduce
> -------------------------
>                 Key: MAPREDUCE-199
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-199
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: applicationmaster, mrv2
>            Reporter: Benjamin Reed
>            Assignee: Harsh J
>         Attachments: MAPREDUCE-199.patch, MAPREDUCE-199.patch
> It would be nice if we could add method to OutputFormat that would allow a job to indicate
where a reducer for a given partition should should run. This is similar to the getSplits()
method on InputFormat. In our application the reducer is using other data in addition to the
map outputs during processing and data accesses could be made more efficient if the JobTracker
scheduled the reducers to run on specific hosts.

This message was sent by Atlassian JIRA

View raw message