hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
Date Wed, 01 Aug 2012 22:17:02 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426955#comment-13426955
] 

Arun C Murthy commented on MAPREDUCE-4495:
------------------------------------------

I had a brief discussion with Alejandro about this, for full disclosure I'll post it up here.
I'll let Alejandro post his response of course.

----

One of my concerns accepting this as a new module is that we are in the danger of turning
YARN into an umbrella project here. The ASF is, very rightly, concerned about this.

Hadoop YARN is, and should remain, merely as the framework. 

Over time we will, if successful, have several applications. We cannot become an aggregation
project for them - this would rightly get us in the awkward situation of being a disparate
set of communities. Hadoop already went through this with HBase, Zookeeper, Hive, Pig etc.
and we have since remedied the concerns of the ASF by moving them out as independent TLPs.

Furthermore, if the DAG-AM is to be successful it will need to churn fast in the early days
to react to requirements of communities such as Pig, Hive, Oozie etc. and you don't want to
be tied to release schedules of Hadoop...

Also, technically, this project, if housed in Hadoop, will forever be limited to merely being
able to run MapReduce jobs as part of the workflow and rules out Pig, Hive etc. since we cannot
espouse a dependency on those projects.

Hence, my suggestion is that we consider either starting this in Oozie (I'd love to start
contributing to Oozie via this route) or we start this as a standalone project in either Apache
Incubator or Apache Extras. 

Thoughts?

----

PS: We added DistShell merely as an *example* application and if a community develops around
it, I'm happy to support moving that out too.
                
> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of running
a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage
the life cycle of this application in terms of requesting the needed resources from the RM,
and starting, monitoring and retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, these are some
of the advantages:
>  - Less number of consumed resources, since only one application master will be spawned
for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple consecutive jobs
in the workflow (no need to request/wait for resources for every individual job from the central
RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the workflow
(e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig and hive
to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message