hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
Date Wed, 17 Oct 2012 18:02:03 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478083#comment-13478083

Alejandro Abdelnur commented on MAPREDUCE-4495:

@eric14, thanks for your comment. As I've indicated at the end of my last comment,  replanning
mid-job (writing a WF in-flight) is possible with the WFLIB (and if any, it may require minor
tweaks to it). My suggestion on first replacing the existing JobControl with one that runs
a workflow (WFAM or Oozie) is an initial step, which I believe would bring a significant value
(I respectfully disagree with your 'does not seem very helpful') to a stable version of Pig
with minimal work. This is the same approach you've suggested for the WFAM interacting with
the MRAM via the JobClient API for the first cut not to require significant changes in the
the MRAM. Medium/long term I concur with you on re-planning mid-job, and I would love to see
details on the idea or a a design doc.

@revans2 (Bobby), thanks for again for your comments, following up on them.

On *I am more curious about restarting the child AMs..*, I think it is the responsibility
of each AM implementation to define what its recovery capabilities are (clean up and restart
job from scratch or continue from a stable checkpoint).

On *The concept is great, I think that MR originally had that concept to reestablish communication
with its tasks to..*, note that we are talking at AM level, not task level, you'd be using
the cline API of an AM to reconnect, after that is up to the AM capabilities. This is how
Oozie works today with WF actions jobs; when oozie goes down, when it comes back reconnects
to Hadoop with the jobID, checks the job status and continues as appropriate.

On *My point is that just replacing the default container allocator..*, agree, last friday
in the YARN meetup I was suggesting (for other reasons (1)) we should add a new method to
the AM-NM protocol, to be able restart an existing container providing a subset of the currently
allocated resources, on such call the NM would return unused resources back to the RM and
it would restart the container as requested with the provided restart command.

On *I get that you are constrained by the DAG,..*

Keeping things as are today in the WF lib, if you have a fork, the nodes are started in the
order the are defined. If you want an AM to have priority over other, we could easily add
a priority attribute to actions that it is used on parallel runs to decide which one gets
started first.

On *The MRAM currently does not do anything to allow for clients..*, the WFAM children AMs
are an implementation detail in my mind, they should not be visible by the WFAM client.

On *I know that Rob Parker and Jason Lowe..*, I'd love to get details on that.

(1) the reason was that we could, in the case of MR jobs, after the Map task completes, restart
the container with a very small footprint to serve the shuffle data, by doing that we could
remove the shuffle service from the NM, which has no business being there.

> Workflow Application Master in YARN
> -----------------------------------
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>         Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, MapReduceWorkflowAM.pdf,
> It is useful to have a workflow application master, which will be capable of running
a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage
the life cycle of this application in terms of requesting the needed resources from the RM,
and starting, monitoring and retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, these are some
of the advantages:
>  - Less number of consumed resources, since only one application master will be spawned
for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple consecutive jobs
in the workflow (no need to request/wait for resources for every individual job from the central
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the workflow
(e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig and hive
to provide an optimized way of running their workflows.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message