hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-279) Map-Reduce 2.0
Date Fri, 18 Mar 2011 21:56:31 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008648#comment-13008648

Chris Douglas commented on MAPREDUCE-279:

bq. Why not contain a ContainerLaunchContext to specify the container in which to run the
AM? Seems like lots of duplicated fields.
Agreed. Fixing this also addresses the URL as insufficient for resources. The \_todo form
was introduced to effect this, and remains in-progress.

bq. how does one access stderr/stdout contents? both while they're being written and after
a container has terminated? (maybe I just haven't gotten to that bit yet somewhere else)
This is still a TODO (working on it now). In the short term, something similar to what the
TT does is probably sufficient, I hope.

bq. Did you consider making the ids all strings instead of ints? The pro would be that there
could be canonical formats, like "AM-<hex id>" for app masters vs "C-<hex id>"
for containers.
Some of the implementation ended up relying on a consistent mapping of int ids to strings,
so going all the way could make sense. On the other hand, parsing strings to determine relationships
between containers and applications is regrettable.

bq. the URL record is missing user/password used for http basic auth or s3n auth
Agreed, full URIs should be supported, though pushing that all the way through FileContext
and FileSystem could be painful.

bq. just to clarify, APPLICATION visibility means "only to this application submitted by this
user". ie if joe and bob both submit MapReduce 2.x.y jobs with identical jars, it still won't
share, even if sha1s match?
Right. The target layout for the NodeManager looks roughly like this:
for x in localdir:
$x/filecache # public cache
$x/usercache/filecache # private cache
$x/usercache/$user/appcache/$appid/filecache # application cache
$x/usercache/$user/appcache/$appid/output # output retained after container exits, i.e. intermediate
So the end of the container and application can just delete those subdirs. Matching a job
jar between invocations would require one to register that resource as PUBLIC/PRIVATE. The
APPLICATION scope is more for job.xml and the like.

> Map-Reduce 2.0
> --------------
>                 Key: MAPREDUCE-279
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, tasktracker
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.23.0
>         Attachments: MR-279.patch, MR-279.patch, MR-279.sh, MR-279_MR_files_to_move.txt
> Re-factor MapReduce into a generic resource scheduler and a per-job, user-defined component
that manages the application execution. 

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message