flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Tzoumas <ktzou...@apache.org>
Subject Re: [PROPOSAL] Structure the Flink Open Source Development
Date Thu, 12 May 2016 11:40:56 GMT
Big +1 from my side, I think this will help the community grow and prosper
big time!

On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <mjsax@apache.org> wrote:

> +1 from my side.
>
> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
> it's me, even the correct spelling would be with two 't' :P)
>
> -Matthias
>
> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> > +1 for the proposal
> > On May 12, 2016 12:13 PM, "Stephan Ewen" <sewen@apache.org> wrote:
> >
> >> Yes, Gabor Gevay, that did refer to you!
> >>
> >> Sorry for the ambiguity...
> >>
> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> balassi.marton@gmail.com
> >>>
> >> wrote:
> >>
> >>> +1 for the proposal
> >>> @ggevay: I do think that it refers to you. :)
> >>>
> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <ggab90@gmail.com>
> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> There are at least three Gábors in the Flink community,  :) so
> >>>> assuming that the Gábor in the list of maintainers of the DataSet API
> >>>> is referring to me, I'll be happy to do it. :)
> >>>>
> >>>> Best,
> >>>> Gábor G.
> >>>>
> >>>>
> >>>>
> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <sewen@apache.org>:
> >>>>> Hi everyone!
> >>>>>
> >>>>> We propose to establish some lightweight structures in the Flink
open
> >>>>> source community and development process,
> >>>>> to help us better handle the increased interest in Flink (mailing
> >> list
> >>>> and
> >>>>> pull requests), while not overwhelming the
> >>>>> committers, and giving users and contributors a good experience.
> >>>>>
> >>>>> This proposal is triggered by the observation that we are reaching
> >> the
> >>>>> limits of where the current community can support
> >>>>> users and guide new contributors. The below proposal is based on
> >>>>> observations and ideas from Till, Robert, and me.
> >>>>>
> >>>>> ========
> >>>>> Goals
> >>>>> ========
> >>>>>
> >>>>> We try to achieve the following
> >>>>>
> >>>>>   - Pull requests get handled in a timely fashion
> >>>>>   - New contributors are better integrated into the community
> >>>>>   - The community feels empowered on the mailing list.
> >>>>>     But questions that need the attention of someone that has deep
> >>>>> knowledge of a certain part of Flink get their attention.
> >>>>>   - At the same time, the committers that are knowledgeable about
> >> many
> >>>> core
> >>>>> parts do not get completely overwhelmed.
> >>>>>   - We don't overlook threads that report critical issues.
> >>>>>   - We always have a pretty good overview of what the status of
> >> certain
> >>>>> parts of the system are.
> >>>>>       -> What are often encountered known issues
> >>>>>       -> What are the most frequently requested features
> >>>>>
> >>>>>
> >>>>> ========
> >>>>> Problems
> >>>>> ========
> >>>>>
> >>>>> Looking into the process, there are two big issues:
> >>>>>
> >>>>> (1) Up to now, we have been relying on the fact that everything
just
> >>>>> "organizes itself", driven by best effort. That assumes
> >>>>> that everyone feels equally responsible for every part, question,
and
> >>>>> contribution. At the current state, this is impossible
> >>>>> to maintain, it overwhelms the committers and contributors.
> >>>>>
> >>>>> Example: Pull requests are picked up by whoever wants to pick them
> >> up.
> >>>> Pull
> >>>>> requests that are a lot of work, have little
> >>>>> chance of getting in, or relate to less active components are
> >> sometimes
> >>>> not
> >>>>> picked up. When contributors are pretty
> >>>>> loaded already, it may happen that no one eventually feels
> >> responsible
> >>> to
> >>>>> pick up a pull request, and it falls through the cracks.
> >>>>>
> >>>>> (2) There is no good overview of what are known shortcomings,
> >> efforts,
> >>>> and
> >>>>> requested features for different parts of the system.
> >>>>> This information exists in various peoples' heads, but is not easily
> >>>>> accessible for new people. The Flink JIRA is not well
> >>>>> maintained, it is not easy to draw insights from that.
> >>>>>
> >>>>>
> >>>>> ===========
> >>>>> The Proposal
> >>>>> ===========
> >>>>>
> >>>>> Since we are building a parallel system, the natural solution seems
> >> to
> >>>> be:
> >>>>> partition the workload ;-)
> >>>>>
> >>>>> We propose to define a set of components for Flink. Each component
is
> >>>>> maintained or tracked by one or more
> >>>>> people - let's call them maintainers. It is important to note that
we
> >>>> don't
> >>>>> suggest the maintainers as an authoritative role, but
> >>>>> simply as committers or contributors that visibly step up for a
> >> certain
> >>>>> component, and mainly track and drive the efforts
> >>>>> pertaining to that component.
> >>>>>
> >>>>> It is also important to realize that we do not want to suggest that
> >>>> people
> >>>>> get less involved with certain parts and components, because
> >>>>> they are not the maintainers. We simply want to make sure that each
> >>> pull
> >>>>> request or question or contribution has in the end
> >>>>> one person (or a small set of people) responsible for catching and
> >>>> tracking
> >>>>> it, if it was not worked on by the pro-active
> >>>>> community.
> >>>>>
> >>>>> For some components, having multiple maintainers will be helpful.
In
> >>> that
> >>>>> case, one maintainer should be the "chair" or "lead"
> >>>>> and make sure that no issue of that component gets lost between
the
> >>>>> multiple maintainers.
> >>>>>
> >>>>>
> >>>>> A maintainers' role is:
> >>>>> -----------------------------
> >>>>>
> >>>>>   - Have an overview of which of the open pull requests relate to
> >> their
> >>>>> component
> >>>>>   - Drive the pull requests relating to the component to resolution
> >>>>>       => Moderate the decision whether the feature should be
merged
> >>>>>       => Make sure the pull request gets a shepherd.
> >>>>>            In many cases, the maintainers would shepherd themselves.
> >>>>>       => In case the shepherd becomes inactive, the maintainers
need
> >> to
> >>>>> find a new shepherd.
> >>>>>
> >>>>>   - Have an overview of what are the known issues of their component
> >>>>>   - Have an overview of what are the frequently requested features
of
> >>>> their
> >>>>> component
> >>>>>
> >>>>>   - Have an overview of which contributors are doing very good work
> >> in
> >>>>> their component,
> >>>>>     would be candidates for committers, and should be mentored
> >> towards
> >>>> that.
> >>>>>
> >>>>>   - Resolve email threads that have been brought to their attention,
> >>>>> because deeper
> >>>>>     component knowledge is required for that thread.
> >>>>>
> >>>>> A maintainers' role is NOT:
> >>>>> ----------------------------------
> >>>>>
> >>>>>   - Review all pull requests of that component
> >>>>>   - Answer every mail with questions about that component
> >>>>>   - Fix all bugs and implement all features of that components
> >>>>>
> >>>>>
> >>>>> We imagine the following way that the community and the maintainers
> >>>>> interact:
> >>>>>
> >>>>
> >>>
> >>
> ---------------------------------------------------------------------------------------------------------
> >>>>>
> >>>>>   - Pull requests should be tagged by component. Since we cannot
add
> >>>> labels
> >>>>> at this point, we need
> >>>>>     to rely on the following:
> >>>>>      => The pull request opener should name the pull request
like
> >>>>> "[FLINK-XXX] [component] Title"
> >>>>>      => Components can be (re) tagged by adding special comments
in
> >> the
> >>>>> pull request ("==> component client")
> >>>>>      => With some luck, GitHub and Apache Infra will allow us
to use
> >>>> labels
> >>>>> at some point
> >>>>>
> >>>>>   - When pull requests are associated with a component, the
> >> maintainers
> >>>>> will manage them
> >>>>>     (decision whether to add, find shepherd, catch dropped pull
> >>> requests)
> >>>>>
> >>>>>   - We assume that maintainers frequently reach out to other
> >> community
> >>>>> members and ask them if they want
> >>>>>     to shepherd a pull request.
> >>>>>
> >>>>>   - On the mailing list, everyone should feel equally empowered
to
> >>> answer
> >>>>> and discuss.
> >>>>>     If at some point in the discussion, some deep technical knowledge
> >>>> about
> >>>>> a component is required,
> >>>>>     the maintainer(s) should be drawn into the discussion.
> >>>>>     Because the Mailing List infrastructure has no support to tag
> >>>> threads,
> >>>>> here are some simple workarounds:
> >>>>>
> >>>>>     => One possibility is to put the maintainers' mail addresses
on
> >> cc
> >>>> for
> >>>>> the thread, so they get the mail
> >>>>>           not just via l the mailing list
> >>>>>     => Another way would be to post something like "+maintainer
> >>> runtime"
> >>>> in
> >>>>> the thread and the "runtime"
> >>>>>          maintainers would have a filter/alert on these keywords
in
> >>> their
> >>>>> mail program.
> >>>>>
> >>>>>   - We assume that maintainers will reach out to community members
> >> that
> >>>> are
> >>>>> very active and helpful in
> >>>>>     a component, and will ask them if they want to be added as
> >>>> maintainers.
> >>>>>     That will make it visible that those people are experts for
that
> >>> part
> >>>>> of Flink.
> >>>>>
> >>>>>
> >>>>> ======================================
> >>>>> Maintainers: Committers and Contributors
> >>>>> ======================================
> >>>>>
> >>>>> It helps if maintainers are committers (since we want them to resolve
> >>>> pull
> >>>>> requests which often involves
> >>>>> merging them).
> >>>>>
> >>>>> Components with multiple maintainers can easily have non-committer
> >>>>> contributors in addition to committer
> >>>>> contributors.
> >>>>>
> >>>>>
> >>>>> ======
> >>>>> JIRA
> >>>>> ======
> >>>>>
> >>>>> Ideally, JIRA can be used to get an overview of what are the known
> >>> issues
> >>>>> of each component, and what are
> >>>>> common feature requests. Unfortunately, the Flink JIRA is quite
> >>>> unorganized
> >>>>> right now.
> >>>>>
> >>>>> A natural followup effort of this proposal would be to define in
JIRA
> >>> the
> >>>>> same components as we defined here,
> >>>>> and have the maintainers keep JIRA meaningful for that particular
> >>>>> component. That would allow us to
> >>>>> easily generate some tables out of JIRA (like top known issues per
> >>>>> component, most requested features)
> >>>>> post them on the dev list once in a while as a "state of the union"
> >>>> report.
> >>>>>
> >>>>> Initial assignment of issues to components should be made by those
> >>> people
> >>>>> opening the issue. The maintainer
> >>>>> of that tagged component needs to change the tag, if the component
> >> was
> >>>>> classified incorrectly.
> >>>>>
> >>>>>
> >>>>> ======================================
> >>>>> Initial Components and Maintainers Suggestion
> >>>>> ======================================
> >>>>>
> >>>>> Below is a suggestion of how to define components for Flink. One
goal
> >>> of
> >>>>> the division was to make it
> >>>>> obvious for the majority of questions and contributions to which
> >>>> component
> >>>>> they would relate. Otherwise,
> >>>>> if many contributions had fuzzy component associations, we would
> >> again
> >>>> not
> >>>>> solve the issue of having clear
> >>>>> responsibilities for who would track the progress and resolution.
> >>>>>
> >>>>> We also looked at each component and wrote the names of some people
> >> who
> >>>> we
> >>>>> thought were natural
> >>>>> experts for the components, and thus natural candidates for
> >>> maintainers.
> >>>>>
> >>>>> **These names are only a starting point for discussion.**
> >>>>>
> >>>>> Once agreed upon, the components and names of maintainers should
be
> >>> kept
> >>>> in
> >>>>> the wiki and updated as
> >>>>> components change and people step up or down.
> >>>>>
> >>>>>
> >>>>> *DataSet API* (*Fabian, Greg, Gabor*)
> >>>>>   - Incuding Hadoop compat. parts
> >>>>>
> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
> >>>>>
> >>>>> *Runtime*
> >>>>>   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
> >>>>>   - Local Runtime (Memory Management, State Backends,
> >> Tasks/Operators)
> >>> (
> >>>>> *Stephan*)
> >>>>>   - Network (*Ufuk*)
> >>>>>
> >>>>> *Client/Optimizer* (*Fabian*)
> >>>>>
> >>>>> *Type system / Type extractor* (Timo)
> >>>>>
> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
> >>>>>
> >>>>> *Libraries*
> >>>>>   - Gelly (*Vasia, Greg*)
> >>>>>   - ML (*Till, Theo*)
> >>>>>   - CEP (*Till*)
> >>>>>   - Python (*Chesnay*)
> >>>>>
> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> >>>>>
> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
> >>>>>
> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
> >>>>>
> >>>>> *Storm Compatibility Layer* (*Mathias*)
> >>>>>
> >>>>> *Scala shell* (*Till*)
> >>>>>
> >>>>> *Startup Shell Scripts* (Ufuk)
> >>>>>
> >>>>> *Flink Build System, Maven Files* (*Robert*)
> >>>>>
> >>>>> *Documentation* (Ufuk)
> >>>>>
> >>>>>
> >>>>> Please let us know what you think about this proposal.
> >>>>> Happy discussing!
> >>>>>
> >>>>> Greetings,
> >>>>> Stephan
> >>>>
> >>>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message