flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simone Robutti <simone.robu...@radicalbit.io>
Subject Re: [PROPOSAL] Structure the Flink Open Source Development
Date Tue, 31 May 2016 19:44:27 GMT
Overseer? Supervisor? Warden?



2016-05-31 21:23 GMT+02:00 Robert Metzger <rmetzger@apache.org>:

> Good point. I haven't thought about this name clash.
> However, I wonder whether it is clear from the context whether we are
> talking about pull request and component shepherding.
>
> Are there any other ideas for the name? If nobody else has concerns
> regarding the "maintainer" name, we can of course keep it.
>
> On Tue, May 31, 2016 at 7:57 PM, Chesnay Schepler <chesnay@apache.org>
> wrote:
>
> > so are we discarding the other "shepherd" role then?
> >
> >
> > On 31.05.2016 19:47, Robert Metzger wrote:
> >
> >> Hi,
> >>
> >> to keep this discussion going, I pasted Stephan's Component proposal
> into
> >> the Wiki:
> >>
> https://cwiki.apache.org/confluence/display/FLINK/Components+and+Shepherds
> >>
> >> Also, I suggest to rename the "maintainer" to "shepherd" to reflect that
> >> still the committers and the PMC is in charge and the shepherd is only
> >> keeping a closer eye on some of the compontents (basically reflecting
> the
> >> structure we have already in the community a bit more officially)
> >>
> >> Lets discuss the proposed shepherds for the components based on
> Stephan's
> >> proposals.
> >>
> >> Please edit in the wiki or write here if you want to add or remove
> >> yourself
> >> for a component.
> >> If somebody, who has been proposed as a shepherd didn't react until end
> of
> >> this week, I'll remove them (for now. I just want to ensure that we
> don't
> >> make somebody a shepherd who isn't aware).
> >>
> >> Regards,
> >> Robert
> >>
> >>
> >> On Tue, May 17, 2016 at 2:10 PM, Stephan Ewen <sewen@apache.org> wrote:
> >>
> >> Hi!
> >>>
> >>> Thanks for all the comments, and the positive resonance! Looks like so
> >>> far
> >>> all are in favor.
> >>>
> >>> I would next add a section to the Wiki and the "How to Contribute"
> Guide
> >>> on
> >>> this structure, incorporating the component split of Optimizer and
> >>> Client.
> >>>
> >>> After that, let's get started with gathering candidates for the
> >>> maintainer
> >>> roles. The ones suggested in the mail would be a starting point.
> >>>
> >>> Greetings,
> >>> Stephan
> >>>
> >>>
> >>> On Mon, May 16, 2016 at 11:48 AM, Kostas Tzoumas <ktzoumas@apache.org>
> >>> wrote:
> >>>
> >>> +1 to Henry's comment, once this makes it to the wiki/website the
> wording
> >>>> needs to make it clear that the governance model is unchanged
> >>>>
> >>>> On Mon, May 16, 2016 at 10:02 AM, Theodore Vasiloudis <
> >>>> theodoros.vasiloudis@gmail.com> wrote:
> >>>>
> >>>> I like the idea of having maintainers as well, hopefully we can
> >>>>>
> >>>> streamline
> >>>>
> >>>>> the reviewing process.
> >>>>>
> >>>>> I of course can volunteer for the FlinkML component.
> >>>>> As I've mentioned before I'd love to get one more committer willing
> to
> >>>>> review PRs in FlinkML; by my last count we were up to ~20 open
> >>>>>
> >>>> ML-related
> >>>
> >>>> PRs.
> >>>>>
> >>>>> Regards,
> >>>>> Theodore
> >>>>>
> >>>>> On Mon, May 16, 2016 at 2:17 AM, Henry Saputra <
> >>>>>
> >>>> henry.saputra@gmail.com>
> >>>
> >>>> wrote:
> >>>>>
> >>>>> The maintainers concept is good idea to make sure PRs are moved
> >>>>>>
> >>>>> smoothly.
> >>>>
> >>>>> But, we need to make sure that this is not additional hierarchy
on
> >>>>>>
> >>>>> top
> >>>
> >>>> of
> >>>>
> >>>>> Flink PMCs.
> >>>>>> This will keep us in spirit of ASF community over code.
> >>>>>>
> >>>>>> Please do add me as cluster management maintainer member.
> >>>>>>
> >>>>>> - Henry
> >>>>>>
> >>>>>> On Tuesday, May 10, 2016, Stephan Ewen <sewen@apache.org>
wrote:
> >>>>>>
> >>>>>> Hi everyone!
> >>>>>>>
> >>>>>>> We propose to establish some lightweight structures in the
Flink
> >>>>>>>
> >>>>>> open
> >>>
> >>>> source community and development process,
> >>>>>>> to help us better handle the increased interest in Flink
(mailing
> >>>>>>>
> >>>>>> list
> >>>>
> >>>>> and
> >>>>>>
> >>>>>>> pull requests), while not overwhelming the
> >>>>>>> committers, and giving users and contributors a good experience.
> >>>>>>>
> >>>>>>> This proposal is triggered by the observation that we are
reaching
> >>>>>>>
> >>>>>> the
> >>>>
> >>>>> limits of where the current community can support
> >>>>>>> users and guide new contributors. The below proposal is
based on
> >>>>>>> observations and ideas from Till, Robert, and me.
> >>>>>>>
> >>>>>>> ========
> >>>>>>> Goals
> >>>>>>> ========
> >>>>>>>
> >>>>>>> We try to achieve the following
> >>>>>>>
> >>>>>>>    - Pull requests get handled in a timely fashion
> >>>>>>>    - New contributors are better integrated into the community
> >>>>>>>    - The community feels empowered on the mailing list.
> >>>>>>>      But questions that need the attention of someone that
has deep
> >>>>>>> knowledge of a certain part of Flink get their attention.
> >>>>>>>    - At the same time, the committers that are knowledgeable
about
> >>>>>>>
> >>>>>> many
> >>>>
> >>>>> core
> >>>>>>
> >>>>>>> parts do not get completely overwhelmed.
> >>>>>>>    - We don't overlook threads that report critical issues.
> >>>>>>>    - We always have a pretty good overview of what the status
of
> >>>>>>>
> >>>>>> certain
> >>>>
> >>>>> parts of the system are.
> >>>>>>>        -> What are often encountered known issues
> >>>>>>>        -> What are the most frequently requested features
> >>>>>>>
> >>>>>>>
> >>>>>>> ========
> >>>>>>> Problems
> >>>>>>> ========
> >>>>>>>
> >>>>>>> Looking into the process, there are two big issues:
> >>>>>>>
> >>>>>>> (1) Up to now, we have been relying on the fact that everything
> >>>>>>>
> >>>>>> just
> >>>
> >>>> "organizes itself", driven by best effort. That assumes
> >>>>>>> that everyone feels equally responsible for every part,
question,
> >>>>>>>
> >>>>>> and
> >>>
> >>>> contribution. At the current state, this is impossible
> >>>>>>> to maintain, it overwhelms the committers and contributors.
> >>>>>>>
> >>>>>>> Example: Pull requests are picked up by whoever wants to
pick them
> >>>>>>>
> >>>>>> up.
> >>>>
> >>>>> Pull
> >>>>>>
> >>>>>>> requests that are a lot of work, have little
> >>>>>>> chance of getting in, or relate to less active components
are
> >>>>>>>
> >>>>>> sometimes
> >>>>
> >>>>> not
> >>>>>>
> >>>>>>> picked up. When contributors are pretty
> >>>>>>> loaded already, it may happen that no one eventually feels
> >>>>>>>
> >>>>>> responsible
> >>>>
> >>>>> to
> >>>>>
> >>>>>> pick up a pull request, and it falls through the cracks.
> >>>>>>>
> >>>>>>> (2) There is no good overview of what are known shortcomings,
> >>>>>>>
> >>>>>> efforts,
> >>>>
> >>>>> and
> >>>>>>
> >>>>>>> requested features for different parts of the system.
> >>>>>>> This information exists in various peoples' heads, but is
not
> >>>>>>>
> >>>>>> easily
> >>>
> >>>> accessible for new people. The Flink JIRA is not well
> >>>>>>> maintained, it is not easy to draw insights from that.
> >>>>>>>
> >>>>>>>
> >>>>>>> ===========
> >>>>>>> The Proposal
> >>>>>>> ===========
> >>>>>>>
> >>>>>>> Since we are building a parallel system, the natural solution
seems
> >>>>>>>
> >>>>>> to
> >>>>
> >>>>> be:
> >>>>>>
> >>>>>>> partition the workload ;-)
> >>>>>>>
> >>>>>>> We propose to define a set of components for Flink. Each
component
> >>>>>>>
> >>>>>> is
> >>>
> >>>> maintained or tracked by one or more
> >>>>>>> people - let's call them maintainers. It is important to
note that
> >>>>>>>
> >>>>>> we
> >>>
> >>>> don't
> >>>>>>
> >>>>>>> suggest the maintainers as an authoritative role, but
> >>>>>>> simply as committers or contributors that visibly step up
for a
> >>>>>>>
> >>>>>> certain
> >>>>
> >>>>> component, and mainly track and drive the efforts
> >>>>>>> pertaining to that component.
> >>>>>>>
> >>>>>>> It is also important to realize that we do not want to suggest
that
> >>>>>>>
> >>>>>> people
> >>>>>>
> >>>>>>> get less involved with certain parts and components, because
> >>>>>>> they are not the maintainers. We simply want to make sure
that each
> >>>>>>>
> >>>>>> pull
> >>>>>
> >>>>>> request or question or contribution has in the end
> >>>>>>> one person (or a small set of people) responsible for catching
and
> >>>>>>>
> >>>>>> tracking
> >>>>>>
> >>>>>>> it, if it was not worked on by the pro-active
> >>>>>>> community.
> >>>>>>>
> >>>>>>> For some components, having multiple maintainers will be
helpful.
> >>>>>>>
> >>>>>> In
> >>>
> >>>> that
> >>>>>
> >>>>>> case, one maintainer should be the "chair" or "lead"
> >>>>>>> and make sure that no issue of that component gets lost
between the
> >>>>>>> multiple maintainers.
> >>>>>>>
> >>>>>>>
> >>>>>>> A maintainers' role is:
> >>>>>>> -----------------------------
> >>>>>>>
> >>>>>>>    - Have an overview of which of the open pull requests
relate to
> >>>>>>>
> >>>>>> their
> >>>>
> >>>>> component
> >>>>>>>    - Drive the pull requests relating to the component to
> resolution
> >>>>>>>        => Moderate the decision whether the feature should
be
> merged
> >>>>>>>        => Make sure the pull request gets a shepherd.
> >>>>>>>             In many cases, the maintainers would shepherd
> >>>>>>>
> >>>>>> themselves.
> >>>
> >>>>        => In case the shepherd becomes inactive, the maintainers
> >>>>>>>
> >>>>>> need
> >>>
> >>>> to
> >>>>
> >>>>> find a new shepherd.
> >>>>>>>
> >>>>>>>    - Have an overview of what are the known issues of their
> >>>>>>>
> >>>>>> component
> >>>
> >>>>    - Have an overview of what are the frequently requested features
> >>>>>>>
> >>>>>> of
> >>>
> >>>> their
> >>>>>>
> >>>>>>> component
> >>>>>>>
> >>>>>>>    - Have an overview of which contributors are doing very
good
> work
> >>>>>>>
> >>>>>> in
> >>>>
> >>>>> their component,
> >>>>>>>      would be candidates for committers, and should be mentored
> >>>>>>>
> >>>>>> towards
> >>>>
> >>>>> that.
> >>>>>>>
> >>>>>>>    - Resolve email threads that have been brought to their
> >>>>>>>
> >>>>>> attention,
> >>>
> >>>> because deeper
> >>>>>>>      component knowledge is required for that thread.
> >>>>>>>
> >>>>>>> A maintainers' role is NOT:
> >>>>>>> ----------------------------------
> >>>>>>>
> >>>>>>>    - Review all pull requests of that component
> >>>>>>>    - Answer every mail with questions about that component
> >>>>>>>    - Fix all bugs and implement all features of that components
> >>>>>>>
> >>>>>>>
> >>>>>>> We imagine the following way that the community and the
maintainers
> >>>>>>> interact:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>
> ---------------------------------------------------------------------------------------------------------
> >>>
> >>>>    - Pull requests should be tagged by component. Since we cannot
> >>>>>>>
> >>>>>> add
> >>>
> >>>> labels
> >>>>>>
> >>>>>>> at this point, we need
> >>>>>>>      to rely on the following:
> >>>>>>>       => The pull request opener should name the pull
request like
> >>>>>>> "[FLINK-XXX] [component] Title"
> >>>>>>>       => Components can be (re) tagged by adding special
comments
> in
> >>>>>>>
> >>>>>> the
> >>>>
> >>>>> pull request ("==> component client")
> >>>>>>>       => With some luck, GitHub and Apache Infra will
allow us to
> >>>>>>>
> >>>>>> use
> >>>
> >>>> labels
> >>>>>>
> >>>>>>> at some point
> >>>>>>>
> >>>>>>>    - When pull requests are associated with a component,
the
> >>>>>>>
> >>>>>> maintainers
> >>>>
> >>>>> will manage them
> >>>>>>>      (decision whether to add, find shepherd, catch dropped
pull
> >>>>>>>
> >>>>>> requests)
> >>>>>
> >>>>>>    - We assume that maintainers frequently reach out to other
> >>>>>>>
> >>>>>> community
> >>>>
> >>>>> members and ask them if they want
> >>>>>>>      to shepherd a pull request.
> >>>>>>>
> >>>>>>>    - On the mailing list, everyone should feel equally empowered
to
> >>>>>>>
> >>>>>> answer
> >>>>>
> >>>>>> and discuss.
> >>>>>>>      If at some point in the discussion, some deep technical
> >>>>>>>
> >>>>>> knowledge
> >>>
> >>>> about
> >>>>>>
> >>>>>>> a component is required,
> >>>>>>>      the maintainer(s) should be drawn into the discussion.
> >>>>>>>      Because the Mailing List infrastructure has no support
to tag
> >>>>>>>
> >>>>>> threads,
> >>>>>>
> >>>>>>> here are some simple workarounds:
> >>>>>>>
> >>>>>>>      => One possibility is to put the maintainers' mail
addresses
> on
> >>>>>>>
> >>>>>> cc
> >>>>
> >>>>> for
> >>>>>>
> >>>>>>> the thread, so they get the mail
> >>>>>>>            not just via l the mailing list
> >>>>>>>      => Another way would be to post something like "+maintainer
> >>>>>>>
> >>>>>> runtime"
> >>>>>
> >>>>>> in
> >>>>>>
> >>>>>>> the thread and the "runtime"
> >>>>>>>           maintainers would have a filter/alert on these
keywords
> in
> >>>>>>>
> >>>>>> their
> >>>>>
> >>>>>> mail program.
> >>>>>>>
> >>>>>>>    - We assume that maintainers will reach out to community
members
> >>>>>>>
> >>>>>> that
> >>>>
> >>>>> are
> >>>>>>
> >>>>>>> very active and helpful in
> >>>>>>>      a component, and will ask them if they want to be added
as
> >>>>>>>
> >>>>>> maintainers.
> >>>>>>
> >>>>>>>      That will make it visible that those people are experts
for
> >>>>>>>
> >>>>>> that
> >>>
> >>>> part
> >>>>>
> >>>>>> of Flink.
> >>>>>>>
> >>>>>>>
> >>>>>>> ======================================
> >>>>>>> Maintainers: Committers and Contributors
> >>>>>>> ======================================
> >>>>>>>
> >>>>>>> It helps if maintainers are committers (since we want them
to
> >>>>>>>
> >>>>>> resolve
> >>>
> >>>> pull
> >>>>>>
> >>>>>>> requests which often involves
> >>>>>>> merging them).
> >>>>>>>
> >>>>>>> Components with multiple maintainers can easily have non-committer
> >>>>>>> contributors in addition to committer
> >>>>>>> contributors.
> >>>>>>>
> >>>>>>>
> >>>>>>> ======
> >>>>>>> JIRA
> >>>>>>> ======
> >>>>>>>
> >>>>>>> Ideally, JIRA can be used to get an overview of what are
the known
> >>>>>>>
> >>>>>> issues
> >>>>>
> >>>>>> of each component, and what are
> >>>>>>> common feature requests. Unfortunately, the Flink JIRA is
quite
> >>>>>>>
> >>>>>> unorganized
> >>>>>>
> >>>>>>> right now.
> >>>>>>>
> >>>>>>> A natural followup effort of this proposal would be to define
in
> >>>>>>>
> >>>>>> JIRA
> >>>
> >>>> the
> >>>>>
> >>>>>> same components as we defined here,
> >>>>>>> and have the maintainers keep JIRA meaningful for that particular
> >>>>>>> component. That would allow us to
> >>>>>>> easily generate some tables out of JIRA (like top known
issues per
> >>>>>>> component, most requested features)
> >>>>>>> post them on the dev list once in a while as a "state of
the union"
> >>>>>>>
> >>>>>> report.
> >>>>>>
> >>>>>>> Initial assignment of issues to components should be made
by those
> >>>>>>>
> >>>>>> people
> >>>>>
> >>>>>> opening the issue. The maintainer
> >>>>>>> of that tagged component needs to change the tag, if the
component
> >>>>>>>
> >>>>>> was
> >>>>
> >>>>> classified incorrectly.
> >>>>>>>
> >>>>>>>
> >>>>>>> ======================================
> >>>>>>> Initial Components and Maintainers Suggestion
> >>>>>>> ======================================
> >>>>>>>
> >>>>>>> Below is a suggestion of how to define components for Flink.
One
> >>>>>>>
> >>>>>> goal
> >>>
> >>>> of
> >>>>>
> >>>>>> the division was to make it
> >>>>>>> obvious for the majority of questions and contributions
to which
> >>>>>>>
> >>>>>> component
> >>>>>>
> >>>>>>> they would relate. Otherwise,
> >>>>>>> if many contributions had fuzzy component associations,
we would
> >>>>>>>
> >>>>>> again
> >>>>
> >>>>> not
> >>>>>>
> >>>>>>> solve the issue of having clear
> >>>>>>> responsibilities for who would track the progress and resolution.
> >>>>>>>
> >>>>>>> We also looked at each component and wrote the names of
some people
> >>>>>>>
> >>>>>> who
> >>>>
> >>>>> we
> >>>>>>
> >>>>>>> thought were natural
> >>>>>>> experts for the components, and thus natural candidates
for
> >>>>>>>
> >>>>>> maintainers.
> >>>>>
> >>>>>> **These names are only a starting point for discussion.**
> >>>>>>>
> >>>>>>> Once agreed upon, the components and names of maintainers
should be
> >>>>>>>
> >>>>>> kept
> >>>>>
> >>>>>> in
> >>>>>>
> >>>>>>> the wiki and updated as
> >>>>>>> components change and people step up or down.
> >>>>>>>
> >>>>>>>
> >>>>>>> *DataSet API* (*Fabian, Greg, Gabor*)
> >>>>>>>    - Incuding Hadoop compat. parts
> >>>>>>>
> >>>>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
> >>>>>>>
> >>>>>>> *Runtime*
> >>>>>>>    - Distributed Coordination (JobManager/TaskManager, Akka)
> >>>>>>>
> >>>>>> (*Till*)
> >>>
> >>>>    - Local Runtime (Memory Management, State Backends,
> >>>>>>>
> >>>>>> Tasks/Operators)
> >>>>
> >>>>> (
> >>>>>
> >>>>>> *Stephan*)
> >>>>>>>    - Network (*Ufuk*)
> >>>>>>>
> >>>>>>> *Client/Optimizer* (*Fabian*)
> >>>>>>>
> >>>>>>> *Type system / Type extractor* (Timo)
> >>>>>>>
> >>>>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
> >>>>>>>
> >>>>>>> *Libraries*
> >>>>>>>    - Gelly (*Vasia, Greg*)
> >>>>>>>    - ML (*Till, Theo*)
> >>>>>>>    - CEP (*Till*)
> >>>>>>>    - Python (*Chesnay*)
> >>>>>>>
> >>>>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> >>>>>>>
> >>>>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
> >>>>>>>
> >>>>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
> >>>>>>>
> >>>>>>> *Storm Compatibility Layer* (*Mathias*)
> >>>>>>>
> >>>>>>> *Scala shell* (*Till*)
> >>>>>>>
> >>>>>>> *Startup Shell Scripts* (Ufuk)
> >>>>>>>
> >>>>>>> *Flink Build System, Maven Files* (*Robert*)
> >>>>>>>
> >>>>>>> *Documentation* (Ufuk)
> >>>>>>>
> >>>>>>>
> >>>>>>> Please let us know what you think about this proposal.
> >>>>>>> Happy discussing!
> >>>>>>>
> >>>>>>> Greetings,
> >>>>>>> Stephan
> >>>>>>>
> >>>>>>>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message