flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: [PROPOSAL] Structure the Flink Open Source Development
Date Thu, 12 May 2016 16:09:39 GMT
All maintainer candidates are only proposals so far. No indication of lead
or anything so far.

Let's first see if we agree on the structure proposed here, and if we take
the components as suggested here or if we refine the list.
Am 12.05.2016 17:45 schrieb "Robert Metzger" <rmetzger@apache.org>:

> tl;dr: +1
>
> I also like the proposal a lot. Our community is growing at a quite fast
> pace and we need to have some structure in place to still keep track of
> everything going on.
>
> I'm happy to see that the proposal mentions cleaning up our JIRA. This is
> something that has been annoying me for quite a while, but its too big to
> do it alone. If maintainers could take care of their components, we should
> have covered already a lot there.
>
> One question regarding the "chair" or "lead" role for components: Is the
> first name in the list of maintainers the lead?
>
> I would actually suggest to wait until all proposed maintainers agreed to
> the proposal. It doesn't make sense to make somebody a maintainer of
> something if they disagree or are not aware of it.
>
>
>
>
> On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels <mxm@apache.org>
> wrote:
>
> > +1 for the initiative. With a better process we will improve the
> > quality of the Flink development and give us more time to focus.
> >
> > Could we have another category "Infrastructure"? This would concern
> > things like CI, nightly deployment of snapshots/documentation, ASF
> > Infra communication. Robert and me could be the initial maintainers
> > for that.
> >
> > On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen <sewen@apache.org> wrote:
> > > Yes, Matthias, that was supposed to be you.
> > > Sorry from another guy who frequently has his name misspelled ;-)
> > >
> > > On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <mjsax@apache.org>
> > wrote:
> > >
> > >> +1 from my side.
> > >>
> > >> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
> > >> it's me, even the correct spelling would be with two 't' :P)
> > >>
> > >> -Matthias
> > >>
> > >> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> > >> > +1 for the proposal
> > >> > On May 12, 2016 12:13 PM, "Stephan Ewen" <sewen@apache.org>
wrote:
> > >> >
> > >> >> Yes, Gabor Gevay, that did refer to you!
> > >> >>
> > >> >> Sorry for the ambiguity...
> > >> >>
> > >> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> > >> balassi.marton@gmail.com
> > >> >>>
> > >> >> wrote:
> > >> >>
> > >> >>> +1 for the proposal
> > >> >>> @ggevay: I do think that it refers to you. :)
> > >> >>>
> > >> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <ggab90@gmail.com>
> > >> wrote:
> > >> >>>
> > >> >>>> Hello,
> > >> >>>>
> > >> >>>> There are at least three Gábors in the Flink community,
 :) so
> > >> >>>> assuming that the Gábor in the list of maintainers of
the DataSet
> > API
> > >> >>>> is referring to me, I'll be happy to do it. :)
> > >> >>>>
> > >> >>>> Best,
> > >> >>>> Gábor G.
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <sewen@apache.org>:
> > >> >>>>> Hi everyone!
> > >> >>>>>
> > >> >>>>> We propose to establish some lightweight structures
in the Flink
> > open
> > >> >>>>> source community and development process,
> > >> >>>>> to help us better handle the increased interest in
Flink
> (mailing
> > >> >> list
> > >> >>>> and
> > >> >>>>> pull requests), while not overwhelming the
> > >> >>>>> committers, and giving users and contributors a good
experience.
> > >> >>>>>
> > >> >>>>> This proposal is triggered by the observation that
we are
> reaching
> > >> >> the
> > >> >>>>> limits of where the current community can support
> > >> >>>>> users and guide new contributors. The below proposal
is based on
> > >> >>>>> observations and ideas from Till, Robert, and me.
> > >> >>>>>
> > >> >>>>> ========
> > >> >>>>> Goals
> > >> >>>>> ========
> > >> >>>>>
> > >> >>>>> We try to achieve the following
> > >> >>>>>
> > >> >>>>>   - Pull requests get handled in a timely fashion
> > >> >>>>>   - New contributors are better integrated into the
community
> > >> >>>>>   - The community feels empowered on the mailing list.
> > >> >>>>>     But questions that need the attention of someone
that has
> deep
> > >> >>>>> knowledge of a certain part of Flink get their attention.
> > >> >>>>>   - At the same time, the committers that are knowledgeable
> about
> > >> >> many
> > >> >>>> core
> > >> >>>>> parts do not get completely overwhelmed.
> > >> >>>>>   - We don't overlook threads that report critical
issues.
> > >> >>>>>   - We always have a pretty good overview of what
the status of
> > >> >> certain
> > >> >>>>> parts of the system are.
> > >> >>>>>       -> What are often encountered known issues
> > >> >>>>>       -> What are the most frequently requested
features
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> ========
> > >> >>>>> Problems
> > >> >>>>> ========
> > >> >>>>>
> > >> >>>>> Looking into the process, there are two big issues:
> > >> >>>>>
> > >> >>>>> (1) Up to now, we have been relying on the fact that
everything
> > just
> > >> >>>>> "organizes itself", driven by best effort. That assumes
> > >> >>>>> that everyone feels equally responsible for every
part,
> question,
> > and
> > >> >>>>> contribution. At the current state, this is impossible
> > >> >>>>> to maintain, it overwhelms the committers and contributors.
> > >> >>>>>
> > >> >>>>> Example: Pull requests are picked up by whoever wants
to pick
> them
> > >> >> up.
> > >> >>>> Pull
> > >> >>>>> requests that are a lot of work, have little
> > >> >>>>> chance of getting in, or relate to less active components
are
> > >> >> sometimes
> > >> >>>> not
> > >> >>>>> picked up. When contributors are pretty
> > >> >>>>> loaded already, it may happen that no one eventually
feels
> > >> >> responsible
> > >> >>> to
> > >> >>>>> pick up a pull request, and it falls through the cracks.
> > >> >>>>>
> > >> >>>>> (2) There is no good overview of what are known shortcomings,
> > >> >> efforts,
> > >> >>>> and
> > >> >>>>> requested features for different parts of the system.
> > >> >>>>> This information exists in various peoples' heads,
but is not
> > easily
> > >> >>>>> accessible for new people. The Flink JIRA is not well
> > >> >>>>> maintained, it is not easy to draw insights from that.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> ===========
> > >> >>>>> The Proposal
> > >> >>>>> ===========
> > >> >>>>>
> > >> >>>>> Since we are building a parallel system, the natural
solution
> > seems
> > >> >> to
> > >> >>>> be:
> > >> >>>>> partition the workload ;-)
> > >> >>>>>
> > >> >>>>> We propose to define a set of components for Flink.
Each
> > component is
> > >> >>>>> maintained or tracked by one or more
> > >> >>>>> people - let's call them maintainers. It is important
to note
> > that we
> > >> >>>> don't
> > >> >>>>> suggest the maintainers as an authoritative role,
but
> > >> >>>>> simply as committers or contributors that visibly
step up for a
> > >> >> certain
> > >> >>>>> component, and mainly track and drive the efforts
> > >> >>>>> pertaining to that component.
> > >> >>>>>
> > >> >>>>> It is also important to realize that we do not want
to suggest
> > that
> > >> >>>> people
> > >> >>>>> get less involved with certain parts and components,
because
> > >> >>>>> they are not the maintainers. We simply want to make
sure that
> > each
> > >> >>> pull
> > >> >>>>> request or question or contribution has in the end
> > >> >>>>> one person (or a small set of people) responsible
for catching
> and
> > >> >>>> tracking
> > >> >>>>> it, if it was not worked on by the pro-active
> > >> >>>>> community.
> > >> >>>>>
> > >> >>>>> For some components, having multiple maintainers will
be
> helpful.
> > In
> > >> >>> that
> > >> >>>>> case, one maintainer should be the "chair" or "lead"
> > >> >>>>> and make sure that no issue of that component gets
lost between
> > the
> > >> >>>>> multiple maintainers.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> A maintainers' role is:
> > >> >>>>> -----------------------------
> > >> >>>>>
> > >> >>>>>   - Have an overview of which of the open pull requests
relate
> to
> > >> >> their
> > >> >>>>> component
> > >> >>>>>   - Drive the pull requests relating to the component
to
> > resolution
> > >> >>>>>       => Moderate the decision whether the feature
should be
> > merged
> > >> >>>>>       => Make sure the pull request gets a shepherd.
> > >> >>>>>            In many cases, the maintainers would shepherd
> > themselves.
> > >> >>>>>       => In case the shepherd becomes inactive,
the maintainers
> > need
> > >> >> to
> > >> >>>>> find a new shepherd.
> > >> >>>>>
> > >> >>>>>   - Have an overview of what are the known issues
of their
> > component
> > >> >>>>>   - Have an overview of what are the frequently requested
> > features of
> > >> >>>> their
> > >> >>>>> component
> > >> >>>>>
> > >> >>>>>   - Have an overview of which contributors are doing
very good
> > work
> > >> >> in
> > >> >>>>> their component,
> > >> >>>>>     would be candidates for committers, and should
be mentored
> > >> >> towards
> > >> >>>> that.
> > >> >>>>>
> > >> >>>>>   - Resolve email threads that have been brought to
their
> > attention,
> > >> >>>>> because deeper
> > >> >>>>>     component knowledge is required for that thread.
> > >> >>>>>
> > >> >>>>> A maintainers' role is NOT:
> > >> >>>>> ----------------------------------
> > >> >>>>>
> > >> >>>>>   - Review all pull requests of that component
> > >> >>>>>   - Answer every mail with questions about that component
> > >> >>>>>   - Fix all bugs and implement all features of that
components
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> We imagine the following way that the community and
the
> > maintainers
> > >> >>>>> interact:
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> >
> ---------------------------------------------------------------------------------------------------------
> > >> >>>>>
> > >> >>>>>   - Pull requests should be tagged by component. Since
we cannot
> > add
> > >> >>>> labels
> > >> >>>>> at this point, we need
> > >> >>>>>     to rely on the following:
> > >> >>>>>      => The pull request opener should name the
pull request
> like
> > >> >>>>> "[FLINK-XXX] [component] Title"
> > >> >>>>>      => Components can be (re) tagged by adding
special comments
> > in
> > >> >> the
> > >> >>>>> pull request ("==> component client")
> > >> >>>>>      => With some luck, GitHub and Apache Infra
will allow us to
> > use
> > >> >>>> labels
> > >> >>>>> at some point
> > >> >>>>>
> > >> >>>>>   - When pull requests are associated with a component,
the
> > >> >> maintainers
> > >> >>>>> will manage them
> > >> >>>>>     (decision whether to add, find shepherd, catch
dropped pull
> > >> >>> requests)
> > >> >>>>>
> > >> >>>>>   - We assume that maintainers frequently reach out
to other
> > >> >> community
> > >> >>>>> members and ask them if they want
> > >> >>>>>     to shepherd a pull request.
> > >> >>>>>
> > >> >>>>>   - On the mailing list, everyone should feel equally
empowered
> to
> > >> >>> answer
> > >> >>>>> and discuss.
> > >> >>>>>     If at some point in the discussion, some deep
technical
> > knowledge
> > >> >>>> about
> > >> >>>>> a component is required,
> > >> >>>>>     the maintainer(s) should be drawn into the discussion.
> > >> >>>>>     Because the Mailing List infrastructure has no
support to
> tag
> > >> >>>> threads,
> > >> >>>>> here are some simple workarounds:
> > >> >>>>>
> > >> >>>>>     => One possibility is to put the maintainers'
mail addresses
> > on
> > >> >> cc
> > >> >>>> for
> > >> >>>>> the thread, so they get the mail
> > >> >>>>>           not just via l the mailing list
> > >> >>>>>     => Another way would be to post something like
"+maintainer
> > >> >>> runtime"
> > >> >>>> in
> > >> >>>>> the thread and the "runtime"
> > >> >>>>>          maintainers would have a filter/alert on
these keywords
> > in
> > >> >>> their
> > >> >>>>> mail program.
> > >> >>>>>
> > >> >>>>>   - We assume that maintainers will reach out to community
> members
> > >> >> that
> > >> >>>> are
> > >> >>>>> very active and helpful in
> > >> >>>>>     a component, and will ask them if they want to
be added as
> > >> >>>> maintainers.
> > >> >>>>>     That will make it visible that those people are
experts for
> > that
> > >> >>> part
> > >> >>>>> of Flink.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> ======================================
> > >> >>>>> Maintainers: Committers and Contributors
> > >> >>>>> ======================================
> > >> >>>>>
> > >> >>>>> It helps if maintainers are committers (since we want
them to
> > resolve
> > >> >>>> pull
> > >> >>>>> requests which often involves
> > >> >>>>> merging them).
> > >> >>>>>
> > >> >>>>> Components with multiple maintainers can easily have
> non-committer
> > >> >>>>> contributors in addition to committer
> > >> >>>>> contributors.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> ======
> > >> >>>>> JIRA
> > >> >>>>> ======
> > >> >>>>>
> > >> >>>>> Ideally, JIRA can be used to get an overview of what
are the
> known
> > >> >>> issues
> > >> >>>>> of each component, and what are
> > >> >>>>> common feature requests. Unfortunately, the Flink
JIRA is quite
> > >> >>>> unorganized
> > >> >>>>> right now.
> > >> >>>>>
> > >> >>>>> A natural followup effort of this proposal would be
to define in
> > JIRA
> > >> >>> the
> > >> >>>>> same components as we defined here,
> > >> >>>>> and have the maintainers keep JIRA meaningful for
that
> particular
> > >> >>>>> component. That would allow us to
> > >> >>>>> easily generate some tables out of JIRA (like top
known issues
> per
> > >> >>>>> component, most requested features)
> > >> >>>>> post them on the dev list once in a while as a "state
of the
> > union"
> > >> >>>> report.
> > >> >>>>>
> > >> >>>>> Initial assignment of issues to components should
be made by
> those
> > >> >>> people
> > >> >>>>> opening the issue. The maintainer
> > >> >>>>> of that tagged component needs to change the tag,
if the
> component
> > >> >> was
> > >> >>>>> classified incorrectly.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> ======================================
> > >> >>>>> Initial Components and Maintainers Suggestion
> > >> >>>>> ======================================
> > >> >>>>>
> > >> >>>>> Below is a suggestion of how to define components
for Flink. One
> > goal
> > >> >>> of
> > >> >>>>> the division was to make it
> > >> >>>>> obvious for the majority of questions and contributions
to which
> > >> >>>> component
> > >> >>>>> they would relate. Otherwise,
> > >> >>>>> if many contributions had fuzzy component associations,
we would
> > >> >> again
> > >> >>>> not
> > >> >>>>> solve the issue of having clear
> > >> >>>>> responsibilities for who would track the progress
and
> resolution.
> > >> >>>>>
> > >> >>>>> We also looked at each component and wrote the names
of some
> > people
> > >> >> who
> > >> >>>> we
> > >> >>>>> thought were natural
> > >> >>>>> experts for the components, and thus natural candidates
for
> > >> >>> maintainers.
> > >> >>>>>
> > >> >>>>> **These names are only a starting point for discussion.**
> > >> >>>>>
> > >> >>>>> Once agreed upon, the components and names of maintainers
should
> > be
> > >> >>> kept
> > >> >>>> in
> > >> >>>>> the wiki and updated as
> > >> >>>>> components change and people step up or down.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> *DataSet API* (*Fabian, Greg, Gabor*)
> > >> >>>>>   - Incuding Hadoop compat. parts
> > >> >>>>>
> > >> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
> > >> >>>>>
> > >> >>>>> *Runtime*
> > >> >>>>>   - Distributed Coordination (JobManager/TaskManager,
Akka)
> > (*Till*)
> > >> >>>>>   - Local Runtime (Memory Management, State Backends,
> > >> >> Tasks/Operators)
> > >> >>> (
> > >> >>>>> *Stephan*)
> > >> >>>>>   - Network (*Ufuk*)
> > >> >>>>>
> > >> >>>>> *Client/Optimizer* (*Fabian*)
> > >> >>>>>
> > >> >>>>> *Type system / Type extractor* (Timo)
> > >> >>>>>
> > >> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max,
Robert*)
> > >> >>>>>
> > >> >>>>> *Libraries*
> > >> >>>>>   - Gelly (*Vasia, Greg*)
> > >> >>>>>   - ML (*Till, Theo*)
> > >> >>>>>   - CEP (*Till*)
> > >> >>>>>   - Python (*Chesnay*)
> > >> >>>>>
> > >> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> > >> >>>>>
> > >> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
> > >> >>>>>
> > >> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
> > >> >>>>>
> > >> >>>>> *Storm Compatibility Layer* (*Mathias*)
> > >> >>>>>
> > >> >>>>> *Scala shell* (*Till*)
> > >> >>>>>
> > >> >>>>> *Startup Shell Scripts* (Ufuk)
> > >> >>>>>
> > >> >>>>> *Flink Build System, Maven Files* (*Robert*)
> > >> >>>>>
> > >> >>>>> *Documentation* (Ufuk)
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Please let us know what you think about this proposal.
> > >> >>>>> Happy discussing!
> > >> >>>>>
> > >> >>>>> Greetings,
> > >> >>>>> Stephan
> > >> >>>>
> > >> >>>
> > >> >>
> > >> >
> > >>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message