hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2038) Making reduce tasks locality-aware
Date Fri, 27 Aug 2010 22:21:55 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903676#action_12903676
] 

Todd Lipcon commented on MAPREDUCE-2038:
----------------------------------------

bq. Do you mean that for aggregation operations that would reduce data-volume along the way,
so you want to do a hierarchical approach

Yep, that's the basic idea. Implementing rack-combiners as a first class concept would be
neat, but the point above is that we can "fake" it if we have locality for reducers, with
a lot less work. I don't know if it would have a huge performance improvement, but we could
experiment with it easily given this feature.

> Making reduce tasks locality-aware
> ----------------------------------
>
>                 Key: MAPREDUCE-2038
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2038
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Hong Tang
>
> Currently Hadoop MapReduce framework does not take into consideration of data locality
when it decides to launch reduce tasks. There are several cases where it could become sub-optimal.
> - The map output data for a particular reduce task are not distributed evenly across
different racks. This could happen when the job does not have many maps, or when there is
heavy skew in map output data.
> - A reduce task may need to access some side file (e.g. Pig fragmented join, or incremental
merge of unsorted smaller dataset with an already sorted large dataset). It'd be useful to
place reduce tasks based on the location of the side files they need to access.
> This jira is created for the purpose of soliciting ideas on how we can make it better.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message