hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Oozie vs. YARN "application"
Date Fri, 18 May 2012 05:11:13 GMT

The documentation is incorrect.  The plan is to be able to support such a thing, but it has
not been implemented yet.  I would like to see it be part of the core map/reduce when it does
happen because there are several different projects that could share this functionality like
oozie, pig and hive.  So in that case, when it does show up it is likely to a superset of
the functionality supported by Oozie, minus the functionality that Arun mentioned, like triggering
of jobs through data availability and on a regular time interval. Hopefully Oozie would eventually
also move to use it.  It would also allow such projects to potentially share DAG level optimizations,
like reducing or even eliminating writing temporary output to HDFS in between small jobs similar
to what spark does.

A DAGApplicationMaster would probably not be a DAG of generic applications it would probably
be a DAG of mapreduce jobs with a few other things like what oozie supports in their DAG definitions.
 The reason for this is because for the DAG Application Master to truly be generic it would
need to launch other Application Masters in separate containers where as if we limit it to
just a subset of AMs we would not have to launch the separate processes, and we could provide
the MR specific DAG level optimizations like I stated previously.  We could still support
launching of other AMs for completeness sake, but I see that as a lower priority.

--Bobby Evans

On 5/17/12 9:29 PM, "Keith Wiley" <kwiley@keithwiley.com> wrote:

On May 17, 2012, at 17:49 , Arun C Murthy wrote:

> Currently YARN doesn't offer anything to manage a DAG of applications.

Well, there is the following webpage:

which suggests that YARN supports a dag of MR jobs within a YARN application (second paragraph,
last sentence).  True, it is a dag of jobs within an application, not a dag of applications,
but that wasn't really my original question.  My question was how the dag structure offered
by YARN differs from that offered by Oozie.

It doesn't seem like the responses to my question so far have adequately reconciled Oozie's
dag of jobs with YARN's dag of jobs.  To the contrary, the only response I've gotten so far
seems to suggest that the webpage above is simply wrong and YARN offers no form of multi-job
dag at all; no response in this thread has confirmed it for example.

> It's fairly easy to implement a DAGApplicationMaster to manage a set of applications
(whether MR or others).

Right, but that applies to whole applications.  Isn't a dag *of* jobs within an application
rather analogous to what Oozie does?  Bear in mind, that is the entire premise of my original
question (the degree of similarity between these two multi-job dag coordination systems).
 The distinction between jobs and applications is only relevant after the relationship to
Oozie has been established, since that was my original question.

I'm really sorry about the apparent misunderstanding.  I didn't intend any confusion on the
matter.  I simply read the webpage at all and was immediately curious about its implications
for Oozie, that's all.

> Arun
> PS: Please use mapreduce-dev@ for technical discussions, general@ is used for project
discussions/announcements. Thanks.

Oof, sorry about that.  It's hard to move a thread mid-discussion of course since that messes
up the archives and I still don't feel that the text on the webpage quoted above, which clearly
describes YARN's dag of jobs, has been addressed, so I'm carrying on for the sake of "the
historical record", but I apologize for not targeting my question at the most relevant mailing
list.  A mailing list named "general" struck me as, well, general, but I must have misinterpreted

Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
                                           --  Edwin A. Abbott, Flatland

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message