flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gábor Gévay <gga...@gmail.com>
Subject Re: [PROPOSAL] Structure the Flink Open Source Development
Date Thu, 12 May 2016 08:40:41 GMT
Hello,

There are at least three Gábors in the Flink community,  :) so
assuming that the Gábor in the list of maintainers of the DataSet API
is referring to me, I'll be happy to do it. :)

Best,
Gábor G.



2016-05-10 11:24 GMT+02:00 Stephan Ewen <sewen@apache.org>:
> Hi everyone!
>
> We propose to establish some lightweight structures in the Flink open
> source community and development process,
> to help us better handle the increased interest in Flink (mailing list and
> pull requests), while not overwhelming the
> committers, and giving users and contributors a good experience.
>
> This proposal is triggered by the observation that we are reaching the
> limits of where the current community can support
> users and guide new contributors. The below proposal is based on
> observations and ideas from Till, Robert, and me.
>
> ========
> Goals
> ========
>
> We try to achieve the following
>
>   - Pull requests get handled in a timely fashion
>   - New contributors are better integrated into the community
>   - The community feels empowered on the mailing list.
>     But questions that need the attention of someone that has deep
> knowledge of a certain part of Flink get their attention.
>   - At the same time, the committers that are knowledgeable about many core
> parts do not get completely overwhelmed.
>   - We don't overlook threads that report critical issues.
>   - We always have a pretty good overview of what the status of certain
> parts of the system are.
>       -> What are often encountered known issues
>       -> What are the most frequently requested features
>
>
> ========
> Problems
> ========
>
> Looking into the process, there are two big issues:
>
> (1) Up to now, we have been relying on the fact that everything just
> "organizes itself", driven by best effort. That assumes
> that everyone feels equally responsible for every part, question, and
> contribution. At the current state, this is impossible
> to maintain, it overwhelms the committers and contributors.
>
> Example: Pull requests are picked up by whoever wants to pick them up. Pull
> requests that are a lot of work, have little
> chance of getting in, or relate to less active components are sometimes not
> picked up. When contributors are pretty
> loaded already, it may happen that no one eventually feels responsible to
> pick up a pull request, and it falls through the cracks.
>
> (2) There is no good overview of what are known shortcomings, efforts, and
> requested features for different parts of the system.
> This information exists in various peoples' heads, but is not easily
> accessible for new people. The Flink JIRA is not well
> maintained, it is not easy to draw insights from that.
>
>
> ===========
> The Proposal
> ===========
>
> Since we are building a parallel system, the natural solution seems to be:
> partition the workload ;-)
>
> We propose to define a set of components for Flink. Each component is
> maintained or tracked by one or more
> people - let's call them maintainers. It is important to note that we don't
> suggest the maintainers as an authoritative role, but
> simply as committers or contributors that visibly step up for a certain
> component, and mainly track and drive the efforts
> pertaining to that component.
>
> It is also important to realize that we do not want to suggest that people
> get less involved with certain parts and components, because
> they are not the maintainers. We simply want to make sure that each pull
> request or question or contribution has in the end
> one person (or a small set of people) responsible for catching and tracking
> it, if it was not worked on by the pro-active
> community.
>
> For some components, having multiple maintainers will be helpful. In that
> case, one maintainer should be the "chair" or "lead"
> and make sure that no issue of that component gets lost between the
> multiple maintainers.
>
>
> A maintainers' role is:
> -----------------------------
>
>   - Have an overview of which of the open pull requests relate to their
> component
>   - Drive the pull requests relating to the component to resolution
>       => Moderate the decision whether the feature should be merged
>       => Make sure the pull request gets a shepherd.
>            In many cases, the maintainers would shepherd themselves.
>       => In case the shepherd becomes inactive, the maintainers need to
> find a new shepherd.
>
>   - Have an overview of what are the known issues of their component
>   - Have an overview of what are the frequently requested features of their
> component
>
>   - Have an overview of which contributors are doing very good work in
> their component,
>     would be candidates for committers, and should be mentored towards that.
>
>   - Resolve email threads that have been brought to their attention,
> because deeper
>     component knowledge is required for that thread.
>
> A maintainers' role is NOT:
> ----------------------------------
>
>   - Review all pull requests of that component
>   - Answer every mail with questions about that component
>   - Fix all bugs and implement all features of that components
>
>
> We imagine the following way that the community and the maintainers
> interact:
> ---------------------------------------------------------------------------------------------------------
>
>   - Pull requests should be tagged by component. Since we cannot add labels
> at this point, we need
>     to rely on the following:
>      => The pull request opener should name the pull request like
> "[FLINK-XXX] [component] Title"
>      => Components can be (re) tagged by adding special comments in the
> pull request ("==> component client")
>      => With some luck, GitHub and Apache Infra will allow us to use labels
> at some point
>
>   - When pull requests are associated with a component, the maintainers
> will manage them
>     (decision whether to add, find shepherd, catch dropped pull requests)
>
>   - We assume that maintainers frequently reach out to other community
> members and ask them if they want
>     to shepherd a pull request.
>
>   - On the mailing list, everyone should feel equally empowered to answer
> and discuss.
>     If at some point in the discussion, some deep technical knowledge about
> a component is required,
>     the maintainer(s) should be drawn into the discussion.
>     Because the Mailing List infrastructure has no support to tag threads,
> here are some simple workarounds:
>
>     => One possibility is to put the maintainers' mail addresses on cc for
> the thread, so they get the mail
>           not just via l the mailing list
>     => Another way would be to post something like "+maintainer runtime" in
> the thread and the "runtime"
>          maintainers would have a filter/alert on these keywords in their
> mail program.
>
>   - We assume that maintainers will reach out to community members that are
> very active and helpful in
>     a component, and will ask them if they want to be added as maintainers.
>     That will make it visible that those people are experts for that part
> of Flink.
>
>
> ======================================
> Maintainers: Committers and Contributors
> ======================================
>
> It helps if maintainers are committers (since we want them to resolve pull
> requests which often involves
> merging them).
>
> Components with multiple maintainers can easily have non-committer
> contributors in addition to committer
> contributors.
>
>
> ======
> JIRA
> ======
>
> Ideally, JIRA can be used to get an overview of what are the known issues
> of each component, and what are
> common feature requests. Unfortunately, the Flink JIRA is quite unorganized
> right now.
>
> A natural followup effort of this proposal would be to define in JIRA the
> same components as we defined here,
> and have the maintainers keep JIRA meaningful for that particular
> component. That would allow us to
> easily generate some tables out of JIRA (like top known issues per
> component, most requested features)
> post them on the dev list once in a while as a "state of the union" report.
>
> Initial assignment of issues to components should be made by those people
> opening the issue. The maintainer
> of that tagged component needs to change the tag, if the component was
> classified incorrectly.
>
>
> ======================================
> Initial Components and Maintainers Suggestion
> ======================================
>
> Below is a suggestion of how to define components for Flink. One goal of
> the division was to make it
> obvious for the majority of questions and contributions to which component
> they would relate. Otherwise,
> if many contributions had fuzzy component associations, we would again not
> solve the issue of having clear
> responsibilities for who would track the progress and resolution.
>
> We also looked at each component and wrote the names of some people who we
> thought were natural
> experts for the components, and thus natural candidates for maintainers.
>
> **These names are only a starting point for discussion.**
>
> Once agreed upon, the components and names of maintainers should be kept in
> the wiki and updated as
> components change and people step up or down.
>
>
> *DataSet API* (*Fabian, Greg, Gabor*)
>   - Incuding Hadoop compat. parts
>
> *DataStream API* (*Aljoscha, Max, Stephan*)
>
> *Runtime*
>   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
>   - Local Runtime (Memory Management, State Backends, Tasks/Operators) (
> *Stephan*)
>   - Network (*Ufuk*)
>
> *Client/Optimizer* (*Fabian*)
>
> *Type system / Type extractor* (Timo)
>
> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
>
> *Libraries*
>   - Gelly (*Vasia, Greg*)
>   - ML (*Till, Theo*)
>   - CEP (*Till*)
>   - Python (*Chesnay*)
>
> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
>
> *Streaming Connectors* (*Robert*, *Aljoscha*)
>
> *Batch Connectors and Input/Output Formats* (*Chesnay*)
>
> *Storm Compatibility Layer* (*Mathias*)
>
> *Scala shell* (*Till*)
>
> *Startup Shell Scripts* (Ufuk)
>
> *Flink Build System, Maven Files* (*Robert*)
>
> *Documentation* (Ufuk)
>
>
> Please let us know what you think about this proposal.
> Happy discussing!
>
> Greetings,
> Stephan

Mime
View raw message