hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop
Date Thu, 26 Jul 2012 15:00:20 GMT
Hey Aaron,

On Jul 25, 2012, at 11:16 PM, Aaron T. Myers wrote:

> On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
> 
>> I realize I'm asking a hard question here: why *aren't* they separate
>> projects? What's the barrier? They seem
>> to be operating that way (and have been for a while). And I don't see how
>> Hadoop still couldnt' move along at
>> a fair clip with them as official TLPs themselves.
>> 
> 
> I'm opposed to this if for no other reason than that it makes it difficult
> to make logically-individual changes which span the projects. As much as we
> might like it to be the case, it is not presently true that Common is so
> independent and stable from HDFS and MR/YARN that Common could reasonably
> be separate and have its own release schedule. I think this view is
> supported by the fact that we once had separate SVN repos for Common, HDFS,
> and MR, but we undid that because having to make coordinated commits across
> the several repos, and the complex build dependencies it induced, was too
> onerous.

Fair enough.

> 
> The main reason I'm opposed to making them separate projects is that I
> don't think their internal interfaces are so stable that they could
> reasonably release independently.
> Though we've been pretty good at
> maintaining the stability of the external interfaces, we routinely make
> changes in the internal interfaces of Common/HDFS/MR that make the projects
> fairly tightly-coupled. Note that Arun's proposal specifically calls out
> that the sub-projects would still release together, which I support.

Sub projects are not a good thing at Apache. Well, "official" sub projects 
that have their own committees, mailing lists, etc. You guys aren't talking
about sub projects (though you call them that) -- in reality you are talking
about *products* that the Apache Hadoop PMC releases. They may have
different names, be on different release schedules, have different mailing
lists even (which I still is not the right thing to do), but they are not *projects*.

I guess that's one thing that got me confused with Arun's original proposal:
in it there is talk of different sub-*projects* and making YARN a new sub-*project*
and discussion of it and Map Reduce and each attracting a diverse (implied: different)
community. 

If you guys are talking about *products* that themselves have different *communities*
then pretty much at Apache those are different *projects*. 

If you are talking about different *products* that themselves have *the same community*
who releases those *products* then we are talking about a single *project* at Apache
that has different *products* that it releases (am I confusing you yet?) :)

Regardless, I guess in the end what I was questioning was that if you look
at the net of Arun's proposal minus Project Dependencies (which is really
code level things -- at Apache code is one thing, but we are dealing with
*communities*), and Release Cycles (no changes), the proposal boils down
to: 

1. Creating separate mailing lists for YARN
2. an svn mv command

My advice on #1 was be careful on splitting mailing lists, I've seen that cause trouble
(even before Hadoop existed and in other Apache projects I've cited), and then on #2,
why not execute the svn mv command and just move forward? You all are on the Hadoop
PMC and I assume trust Arun (and that he trusts you guys since you've given each other
the commit bit), so move forward on it.

As for #2, your point about being happy Arun brought this up as it would have 
impact on the build cycle/etc etc., that makes sense and is a good reason to DISCUSS it.


> 
> Yeah I know you are doing great -- my point is, technically, what consensus
>> is required -- you develop code at Apache
>> as individuals -- code is committed -- as are patches, etc. The PMC is
>> there to regulate that, but it sounds like code wise
>> you are proposing an svn mv command -- do you need an email thread to
>> discuss that? Why not just do it, and if someone
>> has a problem, *then* discuss? Dunno, that's just my opinion.
>> 
> 
> I for one really appreciate Arun having this discussion beforehand. Making
> a change like this, even if it ends up being uncontroversial, will at least
> be quite disruptive to the developers working on Hadoop daily. I think it's
> great that Arun sought out feedback first to make sure folks agree that
> it's a worthwhile change to make.

Yep thanks. This is good validation for #2 above then.

> 
> 
>> 
>> The things that you are proposing that are new (e.g., mailing lists) will
>> serve to splinter (at least the discussion in) the community IMHO --
>> this is spoken from experience in 2 situations (Nutch, Lucene) where we
>> had an umbrella projects with tons of virtual "sub projects" that
>> in the end have thrived as their own individual projects. if you are going
>> to go that far, why not create a new Incubator project and just do
>> it clean from the start?
>> 
> 
> We recently discussed (and approved) merging all of the Hadoop
> *-user@mailing lists, so as to not splinter the user community, and
> make the
> project more approachable for users. In my experience, I've seen most
> developers (myself included) subscribe to all of the *-dev@ mailing lists.
> Even though I personally subscribe to all of them, I still prefer to have
> them separate, so that I can easily set up email filters/labels.

Yeah, that's cool. I do the same myself and that makes sense. It just
seemed like a formal proposal to create a project, minus the creating
project thing, so I thought I'd ask.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
View raw message