hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop
Date Thu, 26 Jul 2012 15:00:36 GMT
Thanks for your comments Bobby, makes sense.


On Jul 26, 2012, at 7:28 AM, Robert Evans wrote:

> +1 for what Aaron said.  The projects are not ready to split yet.
> MAPREDUCE-3300 for example.  YARN cannot display a UI for aggregated
> container logs unless we also have the MR History Server up and running.
> If we do want to split all of the projects HDFS, COMMON, YARN, and
> MAPREDUCE it will take some feature and design work to get the APIs to a
> point that there are no more @LimitedPrivate APIs.  I personally would
> like to see this happen eventually, but it is not something on my priority
> list.
> --Bobby Evans
> On 7/26/12 1:16 AM, "Aaron T. Myers" <atm@cloudera.com> wrote:
>> On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>> I realize I'm asking a hard question here: why *aren't* they separate
>>> projects? What's the barrier? They seem
>>> to be operating that way (and have been for a while). And I don't see
>>> how
>>> Hadoop still couldnt' move along at
>>> a fair clip with them as official TLPs themselves.
>> I'm opposed to this if for no other reason than that it makes it difficult
>> to make logically-individual changes which span the projects. As much as
>> we
>> might like it to be the case, it is not presently true that Common is so
>> independent and stable from HDFS and MR/YARN that Common could reasonably
>> be separate and have its own release schedule. I think this view is
>> supported by the fact that we once had separate SVN repos for Common,
>> HDFS,
>> and MR, but we undid that because having to make coordinated commits
>> across
>> the several repos, and the complex build dependencies it induced, was too
>> onerous.
>> The main reason I'm opposed to making them separate projects is that I
>> don't think their internal interfaces are so stable that they could
>> reasonably release independently. Though we've been pretty good at
>> maintaining the stability of the external interfaces, we routinely make
>> changes in the internal interfaces of Common/HDFS/MR that make the
>> projects
>> fairly tightly-coupled. Note that Arun's proposal specifically calls out
>> that the sub-projects would still release together, which I support.
>> Yeah I know you are doing great -- my point is, technically, what
>> consensus
>>> is required -- you develop code at Apache
>>> as individuals -- code is committed -- as are patches, etc. The PMC is
>>> there to regulate that, but it sounds like code wise
>>> you are proposing an svn mv command -- do you need an email thread to
>>> discuss that? Why not just do it, and if someone
>>> has a problem, *then* discuss? Dunno, that's just my opinion.
>> I for one really appreciate Arun having this discussion beforehand. Making
>> a change like this, even if it ends up being uncontroversial, will at
>> least
>> be quite disruptive to the developers working on Hadoop daily. I think
>> it's
>> great that Arun sought out feedback first to make sure folks agree that
>> it's a worthwhile change to make.
>>> The things that you are proposing that are new (e.g., mailing lists)
>>> will
>>> serve to splinter (at least the discussion in) the community IMHO --
>>> this is spoken from experience in 2 situations (Nutch, Lucene) where we
>>> had an umbrella projects with tons of virtual "sub projects" that
>>> in the end have thrived as their own individual projects. if you are
>>> going
>>> to go that far, why not create a new Incubator project and just do
>>> it clean from the start?
>> We recently discussed (and approved) merging all of the Hadoop
>> *-user@mailing lists, so as to not splinter the user community, and
>> make the
>> project more approachable for users. In my experience, I've seen most
>> developers (myself included) subscribe to all of the *-dev@ mailing lists.
>> Even though I personally subscribe to all of them, I still prefer to have
>> them separate, so that I can easily set up email filters/labels.
>> --
>> Aaron T. Myers
>> Software Engineer, Cloudera

Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

View raw message