hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <steve.lough...@gmail.com>
Subject Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop
Date Thu, 26 Jul 2012 16:59:55 GMT
On 25 July 2012 18:40, Arun C Murthy <acm@hortonworks.com> wrote:

> Folks,
> It's been nearly a year since we merged Hadoop YARN into trunk and we have
> made several releases since.
> It's exciting to see various open-source communities (both in the ASF and
> externally) start to explore integration with YARN such as Apache Hama,
> Apache Giraph, Apache S4, Spark etc. This promises to help us realize our
> hopes of making Apache Hadoop a much more general data processing platform
> (& storage, of course) and not tied to MapReduce alone for processing data.
> Furthermore, we already have people contributing interesting prototypes
> such as DistributedShell and PaaS on YARN.
> Given this, I think it would be useful to make YARN a sub-project of
> Apache Hadoop along with Common, HDFS & MapReduce. I believe this would
> help other communities realize that they could consider using YARN as a
> general-purpose resource management layer and help us enhance YARN beyond
> it's humble beginnings.
> Clearly, YARN and MapReduce are different enough that they can and will
> attract a diverse community.
> I'd like to clarify that this proposal *does not* mean we move the code
> base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside
> hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there
> would be *no changes* to release cycles - YARN would be co-released with
> Common, HDFS & MapReduce.

If the goal is to clearly partition the scheduling layer from the app
layer, and you think it helps isolate changes, then yes


Forcing that strict hierarchy does ensure that you really do have a clean
separation of modules, and emphasises that it is more than just MapRed -as
people add more applications I can see that the separation would get their
needs addressed. Having a separate project could also allow Yarn to do a
point release in sync with those other projects, as well as do co-ordinated
releases with Hadoop itself.

It should also make clear that Yarn is designed to be a topology-aware
underpinning of a datacentre, interesting in its own right. Which reminds
me, I'd better get my topology stuff in.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message