hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
Date Tue, 16 Oct 2012 15:51:03 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477098#comment-13477098
] 

Robert Joseph Evans commented on MAPREDUCE-4495:
------------------------------------------------

I am happy to comment just on the design as well.  I brought up where code should be located
because I thought it would have a significant impact on different phases of the design.  I
have a few questions about just the design too. 

How will the WfAM handle itself crashing, having the container it is running on pulled out
from under it, having one if its child AMs crashing, or having those containers pulled out
from under them, this is particularly for the V2 and V3 architectures where the AMs are not
launched by the RM, but as simple containers, or as threads?
Will its state be stored somewhere so it can restart where it left off?
If the WfAM does restart after a crash will it try to reestablish communication with App Masters
running in other containers, will it kill them, or will it just abandon them like MR currently
does?

How will the WfAM schedule containers?  The MR AM is very simple, it does not need anything
complex, just find a free map task if there is a map container available, pick one closest
to the data, or pick a reduce task if a reduce task is available.  How do you decided which
AM etc has a higher priority or is it always FIFO?

For container reuse on MR the localized resources, the config, the JVM, and even the output
do not really need to change between tasks, because they are all doing essentially the same
things.  Containers for map tasks never become reduce tasks or vise versa.  How will the WfAM
handle this? Different jobs are likely to have different classpaths, distributed cache setups,
different command lines, etc. even if they all have the same resource request.

What type of changes are going to be required for a preexisting AM to be able to communicate
with the LocalResourceCoordinator?  Ideally you could provide a simple shim layer that would
allow the App Master to either communicate directly with the RM or go through the WfAM instead.
 This way there really becomes no change for an AppMaster that wants to work with both.

How is security going to be handled?

I am also curious about how you would see us getting to what I was talking [about previously|https://issues.apache.org/jira/browse/MAPREDUCE-4495?focusedCommentId=13427387&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13427387],
and do we want to go there sooner rather then later, or in your opinion it is something we
should not do at all?  Building the DAG and doing static optimizations seems very doable with
the current design by inserting in a phase after parsing the DAG and before beginning execution
of it.  But the ones that are more interesting to me, the ones where the DAG is much more
dynamic look like they would require some significant rework to the design.
                
> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>         Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, MapReduceWorkflowAM.pdf,
yapp_proposal.txt
>
>
> It is useful to have a workflow application master, which will be capable of running
a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage
the life cycle of this application in terms of requesting the needed resources from the RM,
and starting, monitoring and retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, these are some
of the advantages:
>  - Less number of consumed resources, since only one application master will be spawned
for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple consecutive jobs
in the workflow (no need to request/wait for resources for every individual job from the central
RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the workflow
(e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig and hive
to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message