hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop
Date Thu, 26 Jul 2012 14:28:27 GMT
+1 for what Aaron said.  The projects are not ready to split yet.
MAPREDUCE-3300 for example.  YARN cannot display a UI for aggregated
container logs unless we also have the MR History Server up and running.
If we do want to split all of the projects HDFS, COMMON, YARN, and
MAPREDUCE it will take some feature and design work to get the APIs to a
point that there are no more @LimitedPrivate APIs.  I personally would
like to see this happen eventually, but it is not something on my priority

--Bobby Evans

On 7/26/12 1:16 AM, "Aaron T. Myers" <atm@cloudera.com> wrote:

>On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) <
>chris.a.mattmann@jpl.nasa.gov> wrote:
>> I realize I'm asking a hard question here: why *aren't* they separate
>> projects? What's the barrier? They seem
>> to be operating that way (and have been for a while). And I don't see
>> Hadoop still couldnt' move along at
>> a fair clip with them as official TLPs themselves.
>I'm opposed to this if for no other reason than that it makes it difficult
>to make logically-individual changes which span the projects. As much as
>might like it to be the case, it is not presently true that Common is so
>independent and stable from HDFS and MR/YARN that Common could reasonably
>be separate and have its own release schedule. I think this view is
>supported by the fact that we once had separate SVN repos for Common,
>and MR, but we undid that because having to make coordinated commits
>the several repos, and the complex build dependencies it induced, was too
>The main reason I'm opposed to making them separate projects is that I
>don't think their internal interfaces are so stable that they could
>reasonably release independently. Though we've been pretty good at
>maintaining the stability of the external interfaces, we routinely make
>changes in the internal interfaces of Common/HDFS/MR that make the
>fairly tightly-coupled. Note that Arun's proposal specifically calls out
>that the sub-projects would still release together, which I support.
>Yeah I know you are doing great -- my point is, technically, what
>> is required -- you develop code at Apache
>> as individuals -- code is committed -- as are patches, etc. The PMC is
>> there to regulate that, but it sounds like code wise
>> you are proposing an svn mv command -- do you need an email thread to
>> discuss that? Why not just do it, and if someone
>> has a problem, *then* discuss? Dunno, that's just my opinion.
>I for one really appreciate Arun having this discussion beforehand. Making
>a change like this, even if it ends up being uncontroversial, will at
>be quite disruptive to the developers working on Hadoop daily. I think
>great that Arun sought out feedback first to make sure folks agree that
>it's a worthwhile change to make.
>> The things that you are proposing that are new (e.g., mailing lists)
>> serve to splinter (at least the discussion in) the community IMHO --
>> this is spoken from experience in 2 situations (Nutch, Lucene) where we
>> had an umbrella projects with tons of virtual "sub projects" that
>> in the end have thrived as their own individual projects. if you are
>> to go that far, why not create a new Incubator project and just do
>> it clean from the start?
>We recently discussed (and approved) merging all of the Hadoop
>*-user@mailing lists, so as to not splinter the user community, and
>make the
>project more approachable for users. In my experience, I've seen most
>developers (myself included) subscribe to all of the *-dev@ mailing lists.
>Even though I personally subscribe to all of them, I still prefer to have
>them separate, so that I can easily set up email filters/labels.
>Aaron T. Myers
>Software Engineer, Cloudera

View raw message