hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
Date Fri, 03 Aug 2012 22:07:02 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428412#comment-13428412
] 

Siddharth Seth commented on MAPREDUCE-3902:
-------------------------------------------

@Tsuyoshi; I'd spoken with Vinod and others about this a while ago. Should have posted this
earlier.. Adding the functionality to the AM in the current state is possible - but will further
complicate some components which are already quite complicated - and tough to change.

The TaskAttempt state machine is currently really a mix of TaskAttempt transitions as well
as Container transitions. The RMContaienrAllocator is also dealing with more than it should
- Nodes, Containers as well as scheduling. 

The idea was to split the functionality into a separate TaskAttempt, Container and Node state
machine, along with reduced functionality in the scheduler (also decoupling the RM request
and AM scheduling). This would make the code cleaner and make re-use (as well as other improvements
like handling retired nodes) easier to implement.

Had worked with Vinod on the state transitions, and have been working on the implementation
in bits and pieces to see how feasible it is. The code is at https://github.com/sidseth/h2-container-reuse
. It's a little bit of a mess at the moment, with lots of TODOs, etc splattered all over,
but is just about functional. There's no explicit re-use scheduling yet - but re-use can be
tested by running a job which requires more containers than available on the cluster (and
some config changes).

bq. the 2nd topic(combining per container) should be moved, because the change seems to be
too big.
I believe this was, at least initially, meant to ensure that output from all taskAttempts
in one container, would be fetched only once by a reducer (without a common combiner). Either
way, that could be a separate jira.
                
> MR AM should reuse containers for map tasks, there-by allowing fine-grained control on
num-maps for users without need for CombineFileInputFormat etc.
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3902
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster, mrv2
>            Reporter: Arun C Murthy
>            Assignee: Siddharth Seth
>         Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. This is
something similar to JVM re-use we had in 0.20.x, but in a significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole container
at once (i.e. all maps) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message