hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <cdoug...@apache.org>
Subject Re: [VOTE] - Establish YARN as a sub-project of Apache Hadoop
Date Fri, 17 Aug 2012 00:41:02 GMT
The HDFS and MapReduce committer roles were split to ease the
transition to TLPs, where both projects could manage and release their
code independently. However, after an initial burst of enthusiasm,
progress has been trending toward merging the projects back together.
The original rationale for splitting committer roles does not hold and
will not be realized in the foreseeable future, so the two lists
should merge again.

Similarly, either Yarn has reached escape velocity and should be a TLP
(with a sensible subset of its contributors) or it will be developed,
released, and managed concurrently with HDFS and MapReduce. We should
stop pretending that we're close to managing and releasing subprojects
independently in opposition to reality. That fiction is drawing
*meaningful, but irrelevant* distinctions between contributors.
Anticipating a split with a separate committer list has not been a
successful pattern, as measured in pending patches,
administrative/developer overhead, and distracting email threads like
this one.

However, contrary appeals to emotional reasoning citing "exclusion" or
a "lack of trust in contributors" are lazy and invalid. The proposal
is a dry partitioning for a specific, administrative purpose: to be
transparent about criteria to be included in a Yarn TLP while that
project is still managed here. But if that promotion is not imminent-
if, for the foreseeable future, Yarn will be developed and released
with HDFS and MapReduce- then the cost of preparing for a split that
doesn't happen is all overhead. There is no value in incurring it.

Neither is there value in discussing this as anything but an
administrative action with concrete criteria and consequences for
success (TLP status in a reasonable, set timeframe) and failure
(continued dependency and co-development, roles are merged back).
Define these and drop the drama.

tl;dr: I agree with Chris Mattmann: start making subprojects into TLPs
or flatten the roles. These half-measures are expensive drains on
attention. -C

On Thu, Aug 16, 2012 at 1:21 PM, Eli Collins <eli@cloudera.com> wrote:
> On Thu, Aug 16, 2012 at 1:11 PM, Mattmann, Chris A (388J)
> <chris.a.mattmann@jpl.nasa.gov> wrote:
>> Hi Eli,
>>
>> On Aug 16, 2012, at 1:08 PM, Eli Collins wrote:
>>
>>> On Thu, Aug 16, 2012 at 12:59 PM, Mattmann, Chris A (388J)
>>> <chris.a.mattmann@jpl.nasa.gov> wrote:
>>>> Hi Guys,
>>>>
>>>> The existing discussion and conversation below is the precise reason that
I suggested
>>>> Hadoop consider spinning out the rest of its *products* as *projects*. Folks
have piped
>>>> up and listed technical reasons as the challenges behind this, and then responded
>>>> with clear community reasons either by their actions, or by other means.
>>>>
>>>> Having distinct communities as indicated by distinct lists of committers
isn't wrong -- it's
>>>> usually however exemplified by having a distinct Apache project. It sounds
like those
>>>> folks that have been working on YARN for 1.5 years+ as stated would like
to have their
>>>> own distinct Apache community.
>>>
>>> Not sure that's the case, eg I think we all want to keep yarn in the
>>> same code repository, allow patches that update it along with other
>>> hadoop subprojects, co-design it with MR, test/release it together as
>>> well.  That's not a sign of a distinct community.
>>
>> Keeping code in the same repository with a PMC with different sets of
>> permissions in that repository *is* the sign of a distinct community.
>> Doesn't matter if you want the code there together, and allowing patches,
>> and testing and releasing and whatever. Those are technical issues.
>> Having code with *different* rules for *the same* community members
>> is not what I know as "community over code" and the Apache way.
>> And ultimately it's the reason why this email thread won't die. There's
>> an elephant in the room here (pun intended).
>
> For the PMC the permissions are the same across all the subprojects
> and it is the same community. The question here is around committers.
>
> We already have a working model where the various subprojects have
> their own committers and people are trusted to commit across project
> boundaries when it makes sense. I think everyone is in agreement that
> yarn, like the other subprojects, will have it's own set of
> committers, the only open question here as I understand it is whether
> or not we grandfather in the existing MR committers (since that's
> where YARN used to live).  Ie I don't see this as a major elephant in
> the room, just need to decide between Arun's proposal where we
> actively exclude these people or Tom's proposal where we don't.
> Perhaps we should just vote.
>
> Thanks,
> Eli
>
>
>
>
>
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>

Mime
View raw message