spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: [VOTE] Designating maintainers for some Spark components
Date Fri, 07 Nov 2014 01:28:52 GMT
It looks like the difference between the proposed Spark model and the
CloudStack / SVN model is:
* In the former, maintainers / partial committers are a way of centralizing
oversight over particular components among committers
* In the latter, maintainers / partial committers are a way of giving
non-committers some power to make changes

-Sandy

On Thu, Nov 6, 2014 at 5:17 PM, Corey Nolet <cjnolet@gmail.com> wrote:

> PMC [1] is responsible for oversight and does not designate partial or full
> committer. There are projects where all committers become PMC and others
> where PMC is reserved for committers with the most merit (and willingness
> to take on the responsibility of project oversight, releases, etc...).
> Community maintains the codebase through committers. Committers to mentor,
> roll in patches, and spread the project throughout other communities.
>
> Adding someone's name to a list as a "maintainer" is not a barrier. With a
> community as large as Spark's, and myself not being a committer on this
> project, I see it as a welcome opportunity to find a mentor in the areas in
> which I'm interested in contributing. We'd expect the list of names to grow
> as more volunteers gain more interest, correct? To me, that seems quite
> contrary to a "barrier".
>
> [1] http://www.apache.org/dev/pmc.html
>
>
> On Thu, Nov 6, 2014 at 7:49 PM, Matei Zaharia <matei.zaharia@gmail.com>
> wrote:
>
> > So I don't understand, Greg, are the partial committers committers, or
> are
> > they not? Spark also has a PMC, but our PMC currently consists of all
> > committers (we decided not to have a differentiation when we left the
> > incubator). I see the Subversion partial committers listed as
> "committers"
> > on https://people.apache.org/committers-by-project.html#subversion, so I
> > assume they are committers. As far as I can see, CloudStack is similar.
> >
> > Matei
> >
> > > On Nov 6, 2014, at 4:43 PM, Greg Stein <gstein@gmail.com> wrote:
> > >
> > > Partial committers are people invited to work on a particular area, and
> > they do not require sign-off to work on that area. They can get a
> sign-off
> > and commit outside that area. That approach doesn't compare to this
> > proposal.
> > >
> > > Full committers are PMC members. As each PMC member is responsible for
> > *every* line of code, then every PMC member should have complete rights
> to
> > every line of code. Creating disparity flies in the face of a PMC
> member's
> > responsibility. If I am a Spark PMC member, then I have responsibility
> for
> > GraphX code, whether my name is Ankur, Joey, Reynold, or Greg. And
> > interposing a barrier inhibits my responsibility to ensure GraphX is
> > designed, maintained, and delivered to the Public.
> > >
> > > Cheers,
> > > -g
> > >
> > > (and yes, I'm aware of COMMITTERS; I've been changing that file for the
> > past 12 years :-) )
> > >
> > > On Thu, Nov 6, 2014 at 6:28 PM, Patrick Wendell <pwendell@gmail.com
> > <mailto:pwendell@gmail.com>> wrote:
> > > In fact, if you look at the subversion commiter list, the majority of
> > > people here have commit access only for particular areas of the
> > > project:
> > >
> > > http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS <
> > http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS>
> > >
> > > On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell <pwendell@gmail.com
> > <mailto:pwendell@gmail.com>> wrote:
> > > > Hey Greg,
> > > >
> > > > Regarding subversion - I think the reference is to partial vs full
> > > > committers here:
> > > > https://subversion.apache.org/docs/community-guide/roles.html <
> > https://subversion.apache.org/docs/community-guide/roles.html>
> > > >
> > > > - Patrick
> > > >
> > > > On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein <gstein@gmail.com
> <mailto:
> > gstein@gmail.com>> wrote:
> > > >> -1 (non-binding)
> > > >>
> > > >> This is an idea that runs COMPLETELY counter to the Apache Way, and
> is
> > > >> to be severely frowned up. This creates *unequal* ownership of the
> > > >> codebase.
> > > >>
> > > >> Each Member of the PMC should have *equal* rights to all areas of
> the
> > > >> codebase until their purview. It should not be subjected to others'
> > > >> "ownership" except throught the standard mechanisms of reviews and
> > > >> if/when absolutely necessary, to vetos.
> > > >>
> > > >> Apache does not want "leads", "benevolent dictators" or "assigned
> > > >> maintainers", no matter how you may dress it up with multiple
> > > >> maintainers per component. The fact is that this creates an unequal
> > > >> level of ownership and responsibility. The Board has shut down
> > > >> projects that attempted or allowed for "Leads". Just a few months
> ago,
> > > >> there was a problem with somebody calling themself a "Lead".
> > > >>
> > > >> I don't know why you suggest that Apache Subversion does this. We
> > > >> absolutely do not. Never have. Never will. The Subversion codebase
> is
> > > >> owned by all of us, and we all care for every line of it. Some
> people
> > > >> know more than others, of course. But any one of us, can change any
> > > >> part, without being subjected to a "maintainer". Of course, we ask
> > > >> people with more knowledge of the component when we feel
> > > >> uncomfortable, but we also know when it is safe or not to make a
> > > >> specific change. And *always*, our fellow committers can review our
> > > >> work and let us know when we've done something wrong.
> > > >>
> > > >> Equal ownership reduces fiefdoms, enhances a feeling of community
> and
> > > >> project ownership, and creates a more open and inviting project.
> > > >>
> > > >> So again: -1 on this entire concept. Not good, to be polite.
> > > >>
> > > >> Regards,
> > > >> Greg Stein
> > > >> Director, Vice Chairman
> > > >> Apache Software Foundation
> > > >>
> > > >> On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
> > > >>> Hi all,
> > > >>>
> > > >>> I wanted to share a discussion we've been having on the PMC list,
> as
> > well as call for an official vote on it on a public list. Basically, as
> the
> > Spark project scales up, we need to define a model to make sure there is
> > still great oversight of key components (in particular internal
> > architecture and public APIs), and to this end I've proposed
> implementing a
> > maintainer model for some of these components, similar to other large
> > projects.
> > > >>>
> > > >>> As background on this, Spark has grown a lot since joining Apache.
> > We've had over 80 contributors/month for the past 3 months, which I
> believe
> > makes us the most active project in contributors/month at Apache, as well
> > as over 500 patches/month. The codebase has also grown significantly,
> with
> > new libraries for SQL, ML, graphs and more.
> > > >>>
> > > >>> In this kind of large project, one common way to scale development
> > is to assign "maintainers" to oversee key components, where each patch to
> > that component needs to get sign-off from at least one of its
> maintainers.
> > Most existing large projects do this -- at Apache, some large ones with
> > this model are CloudStack (the second-most active project overall),
> > Subversion, and Kafka, and other examples include Linux and Python. This
> is
> > also by-and-large how Spark operates today -- most components have a
> > de-facto maintainer.
> > > >>>
> > > >>> IMO, adopting this model would have two benefits:
> > > >>>
> > > >>> 1) Consistent oversight of design for that component, especially
> > regarding architecture and API. This process would ensure that the
> > component's maintainers see all proposed changes and consider them to fit
> > together in a good way.
> > > >>>
> > > >>> 2) More structure for new contributors and committers -- in
> > particular, it would be easy to look up who's responsible for each module
> > and ask them for reviews, etc, rather than having patches slip between
> the
> > cracks.
> > > >>>
> > > >>> We'd like to start with in a light-weight manner, where the model
> > only applies to certain key components (e.g. scheduler, shuffle) and
> > user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows,
> we
> > can expand it if we deem it useful. The specific mechanics would be as
> > follows:
> > > >>>
> > > >>> - Some components in Spark will have maintainers assigned to them,
> > where one of the maintainers needs to sign off on each patch to the
> > component.
> > > >>> - Each component with maintainers will have at least 2 maintainers.
> > > >>> - Maintainers will be assigned from the most active and
> > knowledgeable committers on that component by the PMC. The PMC can vote
> to
> > add / remove maintainers, and maintained components, through consensus.
> > > >>> - Maintainers are expected to be active in responding to patches
> for
> > their components, though they do not need to be the main reviewers for
> them
> > (e.g. they might just sign off on architecture / API). To prevent
> inactive
> > maintainers from blocking the project, if a maintainer isn't responding
> in
> > a reasonable time period (say 2 weeks), other committers can merge the
> > patch, and the PMC will want to discuss adding another maintainer.
> > > >>>
> > > >>> If you'd like to see examples for this model, check out the
> > following projects:
> > > >>> - CloudStack:
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > <
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >
> > <
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > <
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > >>
> > > >>> - Subversion:
> > https://subversion.apache.org/docs/community-guide/roles.html <
> > https://subversion.apache.org/docs/community-guide/roles.html> <
> > https://subversion.apache.org/docs/community-guide/roles.html <
> > https://subversion.apache.org/docs/community-guide/roles.html>>
> > > >>>
> > > >>> Finally, I wanted to list our current proposal for initial
> > components and maintainers. It would be good to get feedback on other
> > components we might add, but please note that personnel discussions (e.g.
> > "I don't think Matei should maintain *that* component) should only happen
> > on the private list. The initial components were chosen to include all
> > public APIs and the main core components, and the maintainers were chosen
> > from the most active contributors to those modules.
> > > >>>
> > > >>> - Spark core public API: Matei, Patrick, Reynold
> > > >>> - Job scheduler: Matei, Kay, Patrick
> > > >>> - Shuffle and network: Reynold, Aaron, Matei
> > > >>> - Block manager: Reynold, Aaron
> > > >>> - YARN: Tom, Andrew Or
> > > >>> - Python: Josh, Matei
> > > >>> - MLlib: Xiangrui, Matei
> > > >>> - SQL: Michael, Reynold
> > > >>> - Streaming: TD, Matei
> > > >>> - GraphX: Ankur, Joey, Reynold
> > > >>>
> > > >>> I'd like to formally call a [VOTE] on this model, to last 72 hours.
> > The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> > > >>>
> > > >>> Matei
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <mailto:
> > dev-unsubscribe@spark.apache.org>
> > > >> For additional commands, e-mail: dev-help@spark.apache.org <mailto:
> > dev-help@spark.apache.org>
> > > >>
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message