hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers" <...@cloudera.com>
Subject Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop
Date Thu, 26 Jul 2012 06:16:43 GMT
On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> I realize I'm asking a hard question here: why *aren't* they separate
> projects? What's the barrier? They seem
> to be operating that way (and have been for a while). And I don't see how
> Hadoop still couldnt' move along at
> a fair clip with them as official TLPs themselves.

I'm opposed to this if for no other reason than that it makes it difficult
to make logically-individual changes which span the projects. As much as we
might like it to be the case, it is not presently true that Common is so
independent and stable from HDFS and MR/YARN that Common could reasonably
be separate and have its own release schedule. I think this view is
supported by the fact that we once had separate SVN repos for Common, HDFS,
and MR, but we undid that because having to make coordinated commits across
the several repos, and the complex build dependencies it induced, was too

The main reason I'm opposed to making them separate projects is that I
don't think their internal interfaces are so stable that they could
reasonably release independently. Though we've been pretty good at
maintaining the stability of the external interfaces, we routinely make
changes in the internal interfaces of Common/HDFS/MR that make the projects
fairly tightly-coupled. Note that Arun's proposal specifically calls out
that the sub-projects would still release together, which I support.

Yeah I know you are doing great -- my point is, technically, what consensus
> is required -- you develop code at Apache
> as individuals -- code is committed -- as are patches, etc. The PMC is
> there to regulate that, but it sounds like code wise
> you are proposing an svn mv command -- do you need an email thread to
> discuss that? Why not just do it, and if someone
> has a problem, *then* discuss? Dunno, that's just my opinion.

I for one really appreciate Arun having this discussion beforehand. Making
a change like this, even if it ends up being uncontroversial, will at least
be quite disruptive to the developers working on Hadoop daily. I think it's
great that Arun sought out feedback first to make sure folks agree that
it's a worthwhile change to make.

> The things that you are proposing that are new (e.g., mailing lists) will
> serve to splinter (at least the discussion in) the community IMHO --
> this is spoken from experience in 2 situations (Nutch, Lucene) where we
> had an umbrella projects with tons of virtual "sub projects" that
> in the end have thrived as their own individual projects. if you are going
> to go that far, why not create a new Incubator project and just do
> it clean from the start?

We recently discussed (and approved) merging all of the Hadoop
*-user@mailing lists, so as to not splinter the user community, and
make the
project more approachable for users. In my experience, I've seen most
developers (myself included) subscribe to all of the *-dev@ mailing lists.
Even though I personally subscribe to all of them, I still prefer to have
them separate, so that I can easily set up email filters/labels.

Aaron T. Myers
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message