hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop
Date Thu, 26 Jul 2012 02:11:53 GMT
Hi Chris,

On Jul 25, 2012, at 7:03 PM, Mattmann, Chris A (388J) wrote:

> Hi Arun,
> IMHO, it sounds like you guys might be better off proposing a new project for the Apache
> Looking at the things you list below the ---, it looks like an Incubator proposal minus
the initial committer
> list, and affiliations and mentors/champions ;)

Fair point, thanks for chiming in Chris. However, I think we should revisit that when everything
in Apache Hadoop (Common, HDFS, YARN & MapReduce) can fly out of the nest as separate
projects. That, I think, is too early and also that keeping Common, HDFS, YARN & MapReduce
together has value in ensuring that Hadoop continues to move along at a fair clip.

> If you don't want to go to that level, I don't think you guys need anyone's permission,
and/or etc., right?
> If YARN is a product of the Apache Hadoop PMC, you guys, as the PMC, can develop it and
evolve it
> (it = the software and the community) how you guys see fit.

Agreed. Which is why I'm trying to gather consensus among the Hadoop community.


> Cheers,
> Chris
> On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote:
>> Folks,
>> It's been nearly a year since we merged Hadoop YARN into trunk and we have made several
releases since.
>> It's exciting to see various open-source communities (both in the ASF and externally)
start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark
etc. This promises to help us realize our hopes of making Apache Hadoop a much more general
data processing platform (& storage, of course) and not tied to MapReduce alone for processing
data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell
and PaaS on YARN.
>> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop
along with Common, HDFS & MapReduce. I believe this would help other communities realize
that they could consider using YARN as a general-purpose resource management layer and help
us enhance YARN beyond it's humble beginnings. 
>> Clearly, YARN and MapReduce are different enough that they can and will attract a
diverse community.
>> I'd like to clarify that this proposal *does not* mean we move the code base out
of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs
& hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles
- YARN would be co-released with Common, HDFS & MapReduce.
>> Thoughts?
>> ----
>> What does it mean to the Hadoop developer community?
>> # Project dependencies
>> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN &
MapReduce. As today, the dependencies *do not change*: 
>> - Common is the base
>> - HDFS depends only on Common
>> - YARN depends only on Common & HDFS 
>> - MapReduce depends on Common, HDFS & YARN.
>> # Jira & Mailing lists
>> We would have a separate YARN jira project and a yarn-dev@ mailing list.
>> We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager,
NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime
etc.). Hence, this isn't a much of a change.
>> # Subversion
>> Not much at all! YARN has, since the beginning, been developed with the understanding
that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn
and hadoop-mapreduce-project/hadoop-mapreduce-client. 
>> Essentially the change would be:
>> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn
>> ... and the necessary, albeit small, changes to our maven build infrastructure.
>> # Release Cycles
>> No changes.
>> YARN would be co-released with Common, HDFS & MapReduce, as is the case today.
>> thanks,
>> Arun
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Arun C. Murthy
Hortonworks Inc.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message