hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-279) Map-Reduce 2.0
Date Fri, 18 Mar 2011 17:08:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008517#comment-13008517
] 

Todd Lipcon commented on MAPREDUCE-279:
---------------------------------------

Hi Arun. I spent the train ride this morning looking over yarn/src/main/avro in the branch.
Here are a few comments, sorry for the somewhat stream-of-consciousness format.


- Is the correct suffix still .genavro? Thought we'd changed the name to .avroidl or something?
- Apache licenses needed on these files
- Does AvroIDL convert javadoc-style comments on records/protocols into JavaDoc on generated
code? If so we should do more of that.


- AMRMProtocol:
-- the "release" parameter to allocate is strange: (a) it seems the function is misnamed if
you can also release things as you call it, and (b) why isn't it an array<ContainerId>?
-- if you want to cancel previous resource requests, do you submit a new one with a negative
numContainers?


- ApplicationSubmissionContext:
-- would be good to have some kind of scheduler-specific parameters here? eg maybe a scheduler
has something beyond just "priority" (eg. perhaps a deadline)
-- using just URL type directly for resources - seems not quite flexible enough? eg one useful
construct would be a URL + checksum
-- what's resources_todo going to be?
-- passing "user" - agreed, this should be more flexible than simple string.
-- Why not contain a ContainerLaunchContext to specify the container in which to run the AM?
Seems like lots of duplicated fields.

- ContainerManager:
-- not following YarnContainerTags - these are opaque enums, how do they get interpolated
in a string?
-- how does one access stderr/stdout contents? both while they're being written and after
a container has terminated? (maybe I just haven't gotten to that bit yet somewhere else)

- yarn-types.avro:
-- For the typesafe ID classes, do we need to specify explicit comparison orderings? I don't
know Avro behavior here.
-- Did you consider making the ids all strings instead of ints? The pro would be that there
could be canonical formats, like "AM-<hex id>" for app masters vs "C-<hex id>"
for containers. AWS does a good job of this.
-- Resource: field names should include units, like "int memoryMB"
-- what are ContainerTokens? could use some extra doc at the protocol layer here. (I assume
this is for security?)
-- The "Container" type doesn't appear 
-- the URL record is missing user/password used for http basic auth or s3n auth
-- there are some hard tabs in this file
-- ApplicationMaster:
--- httpPort seems like it would be better described as something like "httpStatusURL"?
-- LocalResourceVisibility:
--- just to clarify, APPLICATION visibility means "only to this application submitted by this
user". ie if joe and bob both submit MapReduce 2.x.y jobs with identical jars, it still won't
share, even if sha1s match?
--- if bob submits the same application (ie MR 2.x.y) twice, do APPLICATION visibility files
get shared?


> Map-Reduce 2.0
> --------------
>
>                 Key: MAPREDUCE-279
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, tasktracker
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.23.0
>
>         Attachments: MR-279.patch, MR-279.patch, MR-279.sh, MR-279_MR_files_to_move.txt
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, user-defined component
that manages the application execution. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message