hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Ping Du <...@vmware.com>
Subject Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop
Date Thu, 26 Jul 2012 23:03:44 GMT
+1. It definitely should be some work to do for separating YARN, but it deserve.



----- Original Message -----
From: "Arun C Murthy" <acm@hortonworks.com>
To: general@hadoop.apache.org
Sent: Thursday, July 26, 2012 9:40:21 AM
Subject: [DISCUSS] - YARN as a sub-project of Apache Hadoop


It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases

It's exciting to see various open-source communities (both in the ASF and externally) start
to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc.
This promises to help us realize our hopes of making Apache Hadoop a much more general data
processing platform (& storage, of course) and not tied to MapReduce alone for processing
data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell
and PaaS on YARN.

Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with
Common, HDFS & MapReduce. I believe this would help other communities realize that they
could consider using YARN as a general-purpose resource management layer and help us enhance
YARN beyond it's humble beginnings. 

Clearly, YARN and MapReduce are different enough that they can and will attract a diverse

I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/
tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce
in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released
with Common, HDFS & MapReduce.



What does it mean to the Hadoop developer community?

# Project dependencies

The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce.
As today, the dependencies *do not change*: 
- Common is the base
- HDFS depends only on Common
- YARN depends only on Common & HDFS 
- MapReduce depends on Common, HDFS & YARN.

# Jira & Mailing lists

We would have a separate YARN jira project and a yarn-dev@ mailing list.

We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager,
NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime
etc.). Hence, this isn't a much of a change.

# Subversion

Not much at all! YARN has, since the beginning, been developed with the understanding that
it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn
and hadoop-mapreduce-project/hadoop-mapreduce-client. 

Essentially the change would be:
$ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn
... and the necessary, albeit small, changes to our maven build infrastructure.

# Release Cycles

No changes.

YARN would be co-released with Common, HDFS & MapReduce, as is the case today.


View raw message