Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A2DE59BB1 for ; Thu, 26 Jul 2012 21:17:15 +0000 (UTC) Received: (qmail 35549 invoked by uid 500); 26 Jul 2012 21:17:13 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 35408 invoked by uid 500); 26 Jul 2012 21:17:13 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 35400 invoked by uid 99); 26 Jul 2012 21:17:13 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jul 2012 21:17:13 +0000 Received: from localhost (HELO mail-lb0-f176.google.com) (127.0.0.1) (smtp-auth username cutting, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jul 2012 21:17:13 +0000 Received: by lboj14 with SMTP id j14so2019825lbo.35 for ; Thu, 26 Jul 2012 14:17:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.152.125.133 with SMTP id mq5mr242585lab.12.1343337431073; Thu, 26 Jul 2012 14:17:11 -0700 (PDT) Received: by 10.112.9.232 with HTTP; Thu, 26 Jul 2012 14:17:10 -0700 (PDT) In-Reply-To: References: Date: Thu, 26 Jul 2012 14:17:10 -0700 Message-ID: Subject: Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop From: Doug Cutting To: general@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable +1 This would be an improved layering of components. As others have noted we should probably stop using the term "subproject" for these, as that's most often used at Apache for things that are released independently. Better terms might be "components" or "modules". Addressing that might also require restructuring the website. Doug On Wed, Jul 25, 2012 at 6:40 PM, Arun C Murthy wrote: > Folks, > > It's been nearly a year since we merged Hadoop YARN into trunk and we hav= e made several releases since. > > It's exciting to see various open-source communities (both in the ASF and= externally) start to explore integration with YARN such as Apache Hama, Ap= ache Giraph, Apache S4, Spark etc. This promises to help us realize our hop= es of making Apache Hadoop a much more general data processing platform (& = storage, of course) and not tied to MapReduce alone for processing data. Fu= rthermore, we already have people contributing interesting prototypes such = as DistributedShell and PaaS on YARN. > > Given this, I think it would be useful to make YARN a sub-project of Apac= he Hadoop along with Common, HDFS & MapReduce. I believe this would help ot= her communities realize that they could consider using YARN as a general-pu= rpose resource management layer and help us enhance YARN beyond it's humble= beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will a= ttract a diverse community. > > I'd like to clarify that this proposal *does not* mean we move the code b= ase out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside ha= doop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there wo= uld be *no changes* to release cycles - YARN would be co-released with Comm= on, HDFS & MapReduce. > > Thoughts? > > ---- > > What does it mean to the Hadoop developer community? > > # Project dependencies > > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YA= RN & MapReduce. As today, the dependencies *do not change*: > - Common is the base > - HDFS depends only on Common > - YARN depends only on Common & HDFS > - MapReduce depends on Common, HDFS & YARN. > > # Jira & Mailing lists > > We would have a separate YARN jira project and a yarn-dev@ mailing list. > > We already use separate MAPREDUCE jira issues for making changes to YARN = (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce Ap= plicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a cha= nge. > > # Subversion > > Not much at all! YARN has, since the beginning, been developed with the u= nderstanding that it is very independent of MapReduce and the code-bases ar= e already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-= mapreduce-project/hadoop-mapreduce-client. > > Essentially the change would be: > $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-= yarn > ... and the necessary, albeit small, changes to our maven build infrastru= cture. > > # Release Cycles > > No changes. > > YARN would be co-released with Common, HDFS & MapReduce, as is the case t= oday. > > thanks, > Arun