hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
Date Thu, 02 Aug 2012 15:22:03 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427380#comment-13427380

Alejandro Abdelnur commented on MAPREDUCE-4495:

@Chris, as we all know, doing a few JIRAs to add new functionality to Hadoop is completely
different from bootstrapping an incubator project. We can easily avoid circular dependencies,
thus we should. Regarding you 3 questions, only time will tell. I'm not opposed to a separate
project, I'm just worried about starting a new project without a need for it, and only time
and adoption will tell (I'm trying not to go 'project happy' here). At that point we can split
it from Hadoop as it has been done with other projects. 

@Arun, for users of the DAG-AM, the NodeHandler interface is the equivalent to the Mapper/Reducer
interfaces in MapReduce. The proposed Workflow library has a NodeHandler interface which is
called when the workflow job enters a node and when it exits the node. You can define/user
as many NodeHandler implementations you need in a workflow. If DAG-AM lives in mapreduce,
it would provide out the box a NodeHandler implementation for handle MR jobs. Then, if Pig/Hive/Oozie
use the DAG-AM they could use the built-in MR NodeHandler for MR jobs and provide their own
NodeHandlers for other types of tasks. I've made the analogy with the JobControl not with
the intention of providing a replacement for it but to explain the functionality you'd get
with the DAG-AM. Hope this clarifies.

> Workflow Application Master in YARN
> -----------------------------------
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
> It is useful to have a workflow application master, which will be capable of running
a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage
the life cycle of this application in terms of requesting the needed resources from the RM,
and starting, monitoring and retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, these are some
of the advantages:
>  - Less number of consumed resources, since only one application master will be spawned
for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple consecutive jobs
in the workflow (no need to request/wait for resources for every individual job from the central
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the workflow
(e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig and hive
to provide an optimized way of running their workflows.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message