hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
Date Thu, 02 Aug 2012 17:48:04 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427484#comment-13427484
] 

Josh Wills commented on MAPREDUCE-4495:
---------------------------------------

@Chris, thanks for clarifying your meaning. I think the line that threw me for a curve in
your original comment was "Putting code into Apache Hadoop is the same as putting code in
yet-to-be-named-Apache-Incubator-project," since for that to be true, we would need for the
incubating project to have, at the very least, a source code repository created by infrastructure.

And of course, we shouldn't rely on anyone's anecdotal experience, especially when we have
tools that can show us resolution times for infrastructure issues tagged with/Subversion going
all the way back to 2005:

Quarter Closed  Days    Closed/Days
Q1/2005	1	17	17
Q2/2005	18	459	25
Q3/2005	24	403	16
Q4/2005	12	98	8
Q1/2006	8	104	13
Q2/2006	8	149	18
Q3/2006	8	483	60
Q4/2006	10	745	74
Q1/2007	5	733	146
Q2/2007	4	27	6
Q3/2007	1	9	9
Q4/2007	1	0	0
Q1/2008	13	328	25
Q2/2008	7	90	12
Q3/2008	3	65	21
Q4/2008	7	519	74
Q1/2009	9	1393	154
Q2/2009	4	92	23
Q3/2009	8	409	51
Q4/2009	9	934	103
Q1/2010	9	42	4
Q2/2010	13	749	57
Q3/2010	7	92	13
Q4/2010	17	1086	63
Q1/2011	11	102	9
Q2/2011	11	82	7
Q3/2011	17	96	5
Q4/2011	10	72	7
Q1/2012	20	72	3
Q2/2012	13	129	9
Q3/2012	2	10	5

Or for git since it was added in 2009:
Quarter Closed  Days    Closed/Days
Q1/2009	2	6	3
Q2/2009	22	158	7
Q3/2009	17	249	14
Q4/2009	4	114	28
Q1/2010	12	266	22
Q2/2010	11	80	7
Q3/2010	15	271	18
Q4/2010	17	112	6
Q1/2011	15	379	25
Q2/2011	14	7	0
Q3/2011	33	2281	69
Q4/2011	33	632	19
Q1/2012	23	403	17
Q2/2012	28	491	17
Q3/2012	18	1074	59

Sigh. The median would be so much more useful, right?
                
> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of running
a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage
the life cycle of this application in terms of requesting the needed resources from the RM,
and starting, monitoring and retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, these are some
of the advantages:
>  - Less number of consumed resources, since only one application master will be spawned
for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple consecutive jobs
in the workflow (no need to request/wait for resources for every individual job from the central
RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the workflow
(e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig and hive
to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message