Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0BAB7DBF7 for ; Fri, 3 Aug 2012 15:44:04 +0000 (UTC) Received: (qmail 50147 invoked by uid 500); 3 Aug 2012 15:44:03 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 50110 invoked by uid 500); 3 Aug 2012 15:44:03 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 50102 invoked by uid 99); 3 Aug 2012 15:44:03 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2012 15:44:03 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id A2A7514052C for ; Fri, 3 Aug 2012 15:44:03 +0000 (UTC) Date: Fri, 3 Aug 2012 15:44:03 +0000 (UTC) From: "Alejandro Abdelnur (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <707054853.10883.1344008643668.JavaMail.jiratomcat@issues-vm> In-Reply-To: <1686392618.117854.1343681974497.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428194#comment-13428194 ] Alejandro Abdelnur commented on MAPREDUCE-4495: ----------------------------------------------- I don't think PaaS and Workflow-AM are similar. Workflow-AM aims to provide a AM can that can run multiple MR jobs and do intra-AM processing all from the same AM. This would be enough for projects that typically run multiple MR jobs as single unit of processing, like Pig/Hive/Sqoop/Oozie. Workflow-AM will need to tap into the MapReduce AM private classes, as the intention is to fully leverage what has been done already. And most likely will require changes in the MapReduce AM, such as making it thread-safe and multi-mr-job safe (which I believe it is not the case today). Because of this, I think that it belongs in MapReduce. And having it outside, at least during its inception, it will make much more difficult its development. Said this, I don't have any issue, quite the opposite, once we finalize the initial implementation to see how it can be generalized and move out. > Workflow Application Master in YARN > ----------------------------------- > > Key: MAPREDUCE-4495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Affects Versions: 2.0.0-alpha > Reporter: Bo Wang > Assignee: Bo Wang > > It is useful to have a workflow application master, which will be capable of running a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage the life cycle of this application in terms of requesting the needed resources from the RM, and starting, monitoring and retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, these are some of the advantages: > - Less number of consumed resources, since only one application master will be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple consecutive jobs in the workflow (no need to request/wait for resources for every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira