hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
Date Thu, 02 Aug 2012 06:20:09 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427134#comment-13427134

Alejandro Abdelnur commented on MAPREDUCE-4495:

As I've mentioned to Arun during our chat, I think that, at least initially, the Workflow/DAG
AM should come to life within Hadoop. Later we can, as it has been done with other projects,
move it to a separate project once it gets traction/adoption. Thus we would not get distracted
with the bureaucracy required to bootstrap an incubator project.

Following are my reasons why Workflow/DAG AM should be (initially) part of Hadoop:

* It is meant primarily to run workflows of MR jobs. It is not the intention implementing
a general purpose workflow engine.
* It is server version of the JobController.
* It will most likely require changes in the MR AM (making it thread safe and multi MR job).
Being in Hadoop will create the synergy to make this changes rapidly.
* It may require changes in the YARN APIs.
* Being in Hadoop it can be easily consumed by Pig/Hive/Oozie and MR developers. And if those
projects require special actions other than MR jobs they can be easily added as the AM runs
in user space via plugins.
* Doing it in Oozie means that Pig/Hive would not be able to consume it easily as it would
create a circular dependency among those projects.

> Workflow Application Master in YARN
> -----------------------------------
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
> It is useful to have a workflow application master, which will be capable of running
a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage
the life cycle of this application in terms of requesting the needed resources from the RM,
and starting, monitoring and retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, these are some
of the advantages:
>  - Less number of consumed resources, since only one application master will be spawned
for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple consecutive jobs
in the workflow (no need to request/wait for resources for every individual job from the central
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the workflow
(e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig and hive
to provide an optimized way of running their workflows.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message