hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finger, Jay" <jfin...@ebay.com>
Subject Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop
Date Thu, 26 Jul 2012 17:15:05 GMT
I'm not sure what the goal of that is.  If this is an Apache
organizational/political thing then I am oblivious.

If the point is that YARN should not be a subproject of MapReduce, then I
agree completely.  Any argument by which YARN is a subproject of MR could
also be made that YARN should be a subproject of MPI, Spark, etc.  And
obviously it cannot be a subproject of all of them.

To that end, YARN should be a peer of core and hdfs.  I prefer that MR
remain a peer of those as well, but since the current approach seems to
prefer over factoring things with painfully deep hierarchies, then the
consistent thing to do would be to make MR a subproject of YARN (blech).
I prefer simple flat trees, though.


On 7/25/12 6:40 PM, "Arun C Murthy" <acm@hortonworks.com> wrote:

>It's been nearly a year since we merged Hadoop YARN into trunk and we
>have made several releases since.
>It's exciting to see various open-source communities (both in the ASF and
>externally) start to explore integration with YARN such as Apache Hama,
>Apache Giraph, Apache S4, Spark etc. This promises to help us realize our
>hopes of making Apache Hadoop a much more general data processing
>platform (& storage, of course) and not tied to MapReduce alone for
>processing data. Furthermore, we already have people contributing
>interesting prototypes such as DistributedShell and PaaS on YARN.
>Given this, I think it would be useful to make YARN a sub-project of
>Apache Hadoop along with Common, HDFS & MapReduce. I believe this would
>help other communities realize that they could consider using YARN as a
>general-purpose resource management layer and help us enhance YARN beyond
>it's humble beginnings.
>Clearly, YARN and MapReduce are different enough that they can and will
>attract a diverse community.
>I'd like to clarify that this proposal *does not* mean we move the code
>base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside
>hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also,
>there would be *no changes* to release cycles - YARN would be co-released
>with Common, HDFS & MapReduce.
>What does it mean to the Hadoop developer community?
># Project dependencies
>The change is that Hadoop would now have 4 sub-projects: Common, HDFS,
>YARN & MapReduce. As today, the dependencies *do not change*:
>- Common is the base
>- HDFS depends only on Common
>- YARN depends only on Common & HDFS
>- MapReduce depends on Common, HDFS & YARN.
># Jira & Mailing lists
>We would have a separate YARN jira project and a yarn-dev@ mailing list.
>We already use separate MAPREDUCE jira issues for making changes to YARN
>(ResourceManager, NodeManager) and to the MapReduce framework (MapReduce
>ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a
># Subversion
>Not much at all! YARN has, since the beginning, been developed with the
>understanding that it is very independent of MapReduce and the code-bases
>are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and
>Essentially the change would be:
>$ svn mv hadoop-mapreduce-project/hadoop-yarn
>... and the necessary, albeit small, changes to our maven build
># Release Cycles
>No changes.
>YARN would be co-released with Common, HDFS & MapReduce, as is the case

View raw message