hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
Date Fri, 03 Aug 2012 17:46:02 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428257#comment-13428257
] 

Arun C Murthy commented on MAPREDUCE-4495:
------------------------------------------

Alejandro, making MR AM thread-safe is a good goal. We can do that independently of the new
AM. I have opened MAPREDUCE-4513 for the same.

I don't which other 'private' classes you need - frankly that concerns me. It means you are
adding new requirements on MR-AM which isn't an 'api' of that nature.

Also, if we are going that route I strongly suggest we do not import code from Oozie and merely
take JobControl api and support it. That should be a trivial exercise without adding any new
'interfaces' to MapReduce.

So, I see two options:
# Enhance JobControl api to work in AM by making MR-AM, specifially MRAppMaster thread-safe.
This will allow for multiple objects of MRAppMaster to be created. This means there are no
new interfaces to MapReduce.
# Go the full distance, make it generic, import code from Oozie, come up with a new set of
interfaces etc. etc. and do it in a separate Incubator project.

As I indicated previously, my preference is option #2 and I have already offered help to deal
with the specifics so you and Bo can concentrate on getting the code out.

Thoughts?
                
> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of running
a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage
the life cycle of this application in terms of requesting the needed resources from the RM,
and starting, monitoring and retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, these are some
of the advantages:
>  - Less number of consumed resources, since only one application master will be spawned
for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple consecutive jobs
in the workflow (no need to request/wait for resources for every individual job from the central
RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the workflow
(e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig and hive
to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message