flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: [PROPOSAL] Structure the Flink Open Source Development
Date Thu, 12 May 2016 10:56:20 GMT
+1 for the proposal
On May 12, 2016 12:13 PM, "Stephan Ewen" <sewen@apache.org> wrote:

> Yes, Gabor Gevay, that did refer to you!
>
> Sorry for the ambiguity...
>
> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <balassi.marton@gmail.com
> >
> wrote:
>
> > +1 for the proposal
> > @ggevay: I do think that it refers to you. :)
> >
> > On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <ggab90@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > There are at least three Gábors in the Flink community,  :) so
> > > assuming that the Gábor in the list of maintainers of the DataSet API
> > > is referring to me, I'll be happy to do it. :)
> > >
> > > Best,
> > > Gábor G.
> > >
> > >
> > >
> > > 2016-05-10 11:24 GMT+02:00 Stephan Ewen <sewen@apache.org>:
> > > > Hi everyone!
> > > >
> > > > We propose to establish some lightweight structures in the Flink open
> > > > source community and development process,
> > > > to help us better handle the increased interest in Flink (mailing
> list
> > > and
> > > > pull requests), while not overwhelming the
> > > > committers, and giving users and contributors a good experience.
> > > >
> > > > This proposal is triggered by the observation that we are reaching
> the
> > > > limits of where the current community can support
> > > > users and guide new contributors. The below proposal is based on
> > > > observations and ideas from Till, Robert, and me.
> > > >
> > > > ========
> > > > Goals
> > > > ========
> > > >
> > > > We try to achieve the following
> > > >
> > > >   - Pull requests get handled in a timely fashion
> > > >   - New contributors are better integrated into the community
> > > >   - The community feels empowered on the mailing list.
> > > >     But questions that need the attention of someone that has deep
> > > > knowledge of a certain part of Flink get their attention.
> > > >   - At the same time, the committers that are knowledgeable about
> many
> > > core
> > > > parts do not get completely overwhelmed.
> > > >   - We don't overlook threads that report critical issues.
> > > >   - We always have a pretty good overview of what the status of
> certain
> > > > parts of the system are.
> > > >       -> What are often encountered known issues
> > > >       -> What are the most frequently requested features
> > > >
> > > >
> > > > ========
> > > > Problems
> > > > ========
> > > >
> > > > Looking into the process, there are two big issues:
> > > >
> > > > (1) Up to now, we have been relying on the fact that everything just
> > > > "organizes itself", driven by best effort. That assumes
> > > > that everyone feels equally responsible for every part, question, and
> > > > contribution. At the current state, this is impossible
> > > > to maintain, it overwhelms the committers and contributors.
> > > >
> > > > Example: Pull requests are picked up by whoever wants to pick them
> up.
> > > Pull
> > > > requests that are a lot of work, have little
> > > > chance of getting in, or relate to less active components are
> sometimes
> > > not
> > > > picked up. When contributors are pretty
> > > > loaded already, it may happen that no one eventually feels
> responsible
> > to
> > > > pick up a pull request, and it falls through the cracks.
> > > >
> > > > (2) There is no good overview of what are known shortcomings,
> efforts,
> > > and
> > > > requested features for different parts of the system.
> > > > This information exists in various peoples' heads, but is not easily
> > > > accessible for new people. The Flink JIRA is not well
> > > > maintained, it is not easy to draw insights from that.
> > > >
> > > >
> > > > ===========
> > > > The Proposal
> > > > ===========
> > > >
> > > > Since we are building a parallel system, the natural solution seems
> to
> > > be:
> > > > partition the workload ;-)
> > > >
> > > > We propose to define a set of components for Flink. Each component is
> > > > maintained or tracked by one or more
> > > > people - let's call them maintainers. It is important to note that we
> > > don't
> > > > suggest the maintainers as an authoritative role, but
> > > > simply as committers or contributors that visibly step up for a
> certain
> > > > component, and mainly track and drive the efforts
> > > > pertaining to that component.
> > > >
> > > > It is also important to realize that we do not want to suggest that
> > > people
> > > > get less involved with certain parts and components, because
> > > > they are not the maintainers. We simply want to make sure that each
> > pull
> > > > request or question or contribution has in the end
> > > > one person (or a small set of people) responsible for catching and
> > > tracking
> > > > it, if it was not worked on by the pro-active
> > > > community.
> > > >
> > > > For some components, having multiple maintainers will be helpful. In
> > that
> > > > case, one maintainer should be the "chair" or "lead"
> > > > and make sure that no issue of that component gets lost between the
> > > > multiple maintainers.
> > > >
> > > >
> > > > A maintainers' role is:
> > > > -----------------------------
> > > >
> > > >   - Have an overview of which of the open pull requests relate to
> their
> > > > component
> > > >   - Drive the pull requests relating to the component to resolution
> > > >       => Moderate the decision whether the feature should be merged
> > > >       => Make sure the pull request gets a shepherd.
> > > >            In many cases, the maintainers would shepherd themselves.
> > > >       => In case the shepherd becomes inactive, the maintainers need
> to
> > > > find a new shepherd.
> > > >
> > > >   - Have an overview of what are the known issues of their component
> > > >   - Have an overview of what are the frequently requested features of
> > > their
> > > > component
> > > >
> > > >   - Have an overview of which contributors are doing very good work
> in
> > > > their component,
> > > >     would be candidates for committers, and should be mentored
> towards
> > > that.
> > > >
> > > >   - Resolve email threads that have been brought to their attention,
> > > > because deeper
> > > >     component knowledge is required for that thread.
> > > >
> > > > A maintainers' role is NOT:
> > > > ----------------------------------
> > > >
> > > >   - Review all pull requests of that component
> > > >   - Answer every mail with questions about that component
> > > >   - Fix all bugs and implement all features of that components
> > > >
> > > >
> > > > We imagine the following way that the community and the maintainers
> > > > interact:
> > > >
> > >
> >
> ---------------------------------------------------------------------------------------------------------
> > > >
> > > >   - Pull requests should be tagged by component. Since we cannot add
> > > labels
> > > > at this point, we need
> > > >     to rely on the following:
> > > >      => The pull request opener should name the pull request like
> > > > "[FLINK-XXX] [component] Title"
> > > >      => Components can be (re) tagged by adding special comments in
> the
> > > > pull request ("==> component client")
> > > >      => With some luck, GitHub and Apache Infra will allow us to use
> > > labels
> > > > at some point
> > > >
> > > >   - When pull requests are associated with a component, the
> maintainers
> > > > will manage them
> > > >     (decision whether to add, find shepherd, catch dropped pull
> > requests)
> > > >
> > > >   - We assume that maintainers frequently reach out to other
> community
> > > > members and ask them if they want
> > > >     to shepherd a pull request.
> > > >
> > > >   - On the mailing list, everyone should feel equally empowered to
> > answer
> > > > and discuss.
> > > >     If at some point in the discussion, some deep technical knowledge
> > > about
> > > > a component is required,
> > > >     the maintainer(s) should be drawn into the discussion.
> > > >     Because the Mailing List infrastructure has no support to tag
> > > threads,
> > > > here are some simple workarounds:
> > > >
> > > >     => One possibility is to put the maintainers' mail addresses on
> cc
> > > for
> > > > the thread, so they get the mail
> > > >           not just via l the mailing list
> > > >     => Another way would be to post something like "+maintainer
> > runtime"
> > > in
> > > > the thread and the "runtime"
> > > >          maintainers would have a filter/alert on these keywords in
> > their
> > > > mail program.
> > > >
> > > >   - We assume that maintainers will reach out to community members
> that
> > > are
> > > > very active and helpful in
> > > >     a component, and will ask them if they want to be added as
> > > maintainers.
> > > >     That will make it visible that those people are experts for that
> > part
> > > > of Flink.
> > > >
> > > >
> > > > ======================================
> > > > Maintainers: Committers and Contributors
> > > > ======================================
> > > >
> > > > It helps if maintainers are committers (since we want them to resolve
> > > pull
> > > > requests which often involves
> > > > merging them).
> > > >
> > > > Components with multiple maintainers can easily have non-committer
> > > > contributors in addition to committer
> > > > contributors.
> > > >
> > > >
> > > > ======
> > > > JIRA
> > > > ======
> > > >
> > > > Ideally, JIRA can be used to get an overview of what are the known
> > issues
> > > > of each component, and what are
> > > > common feature requests. Unfortunately, the Flink JIRA is quite
> > > unorganized
> > > > right now.
> > > >
> > > > A natural followup effort of this proposal would be to define in JIRA
> > the
> > > > same components as we defined here,
> > > > and have the maintainers keep JIRA meaningful for that particular
> > > > component. That would allow us to
> > > > easily generate some tables out of JIRA (like top known issues per
> > > > component, most requested features)
> > > > post them on the dev list once in a while as a "state of the union"
> > > report.
> > > >
> > > > Initial assignment of issues to components should be made by those
> > people
> > > > opening the issue. The maintainer
> > > > of that tagged component needs to change the tag, if the component
> was
> > > > classified incorrectly.
> > > >
> > > >
> > > > ======================================
> > > > Initial Components and Maintainers Suggestion
> > > > ======================================
> > > >
> > > > Below is a suggestion of how to define components for Flink. One goal
> > of
> > > > the division was to make it
> > > > obvious for the majority of questions and contributions to which
> > > component
> > > > they would relate. Otherwise,
> > > > if many contributions had fuzzy component associations, we would
> again
> > > not
> > > > solve the issue of having clear
> > > > responsibilities for who would track the progress and resolution.
> > > >
> > > > We also looked at each component and wrote the names of some people
> who
> > > we
> > > > thought were natural
> > > > experts for the components, and thus natural candidates for
> > maintainers.
> > > >
> > > > **These names are only a starting point for discussion.**
> > > >
> > > > Once agreed upon, the components and names of maintainers should be
> > kept
> > > in
> > > > the wiki and updated as
> > > > components change and people step up or down.
> > > >
> > > >
> > > > *DataSet API* (*Fabian, Greg, Gabor*)
> > > >   - Incuding Hadoop compat. parts
> > > >
> > > > *DataStream API* (*Aljoscha, Max, Stephan*)
> > > >
> > > > *Runtime*
> > > >   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
> > > >   - Local Runtime (Memory Management, State Backends,
> Tasks/Operators)
> > (
> > > > *Stephan*)
> > > >   - Network (*Ufuk*)
> > > >
> > > > *Client/Optimizer* (*Fabian*)
> > > >
> > > > *Type system / Type extractor* (Timo)
> > > >
> > > > *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
> > > >
> > > > *Libraries*
> > > >   - Gelly (*Vasia, Greg*)
> > > >   - ML (*Till, Theo*)
> > > >   - CEP (*Till*)
> > > >   - Python (*Chesnay*)
> > > >
> > > > *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> > > >
> > > > *Streaming Connectors* (*Robert*, *Aljoscha*)
> > > >
> > > > *Batch Connectors and Input/Output Formats* (*Chesnay*)
> > > >
> > > > *Storm Compatibility Layer* (*Mathias*)
> > > >
> > > > *Scala shell* (*Till*)
> > > >
> > > > *Startup Shell Scripts* (Ufuk)
> > > >
> > > > *Flink Build System, Maven Files* (*Robert*)
> > > >
> > > > *Documentation* (Ufuk)
> > > >
> > > >
> > > > Please let us know what you think about this proposal.
> > > > Happy discussing!
> > > >
> > > > Greetings,
> > > > Stephan
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message