hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-989) Allow segregation of DistributedCache for maps and reduces
Date Mon, 21 Sep 2009 23:31:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758062#action_12758062
] 

Milind Bhandarkar commented on MAPREDUCE-989:
---------------------------------------------

If as eric suggests, the tasks themselves request the cached files needed (presumably in the
configure method of the user-supplied mapper / reducer), then we lose an opportunity of overlapping
populating cache for reducers with fetching map outputs.

My request for different configuration variables for map and reduce tasks for cache is consistent
with the basic observation that map and reduce runtime requirements are different. This observation
has resulted in several additions to configuration variables lately, such as specifying different
child.java.opts, specifying different ulimits, specifying different task runners etc for these
two types of tasks. So, it is imperative that users provide different cache files and archives
for different tasks too.

This cannot be in the user-provided code, because otherwise, hadoop streaming, and pipes,
and pig will have to be modified to implement that functionality in the wrappers they provide.
Having one implementation provided by the framework seems to me the best way to go.

> Allow segregation of DistributedCache for maps and reduces
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-989
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-989
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>            Reporter: Arun C Murthy
>
> Applications might have differing needs for files in the DistributedCache wrt maps and
reduces. We should allow them to specify them separately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message