flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: [PROPOSAL] Structure the Flink Open Source Development
Date Thu, 12 May 2016 12:12:35 GMT
Hey Stephan!

Thanks to you and the others who started this. I really like the
proposal and I'm happy to see my name on some components.

So, +1.

I'd say let's wait until the end of the week/beginning of next week to
see if there is any disagreement with the propsal in the community
(doesn't look like it so far ;-)). Then we can continue to execute
this. :-)

– Ufuk


On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen <sewen@apache.org> wrote:
> Yes, Matthias, that was supposed to be you.
> Sorry from another guy who frequently has his name misspelled ;-)
>
> On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <mjsax@apache.org> wrote:
>
>> +1 from my side.
>>
>> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
>> it's me, even the correct spelling would be with two 't' :P)
>>
>> -Matthias
>>
>> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
>> > +1 for the proposal
>> > On May 12, 2016 12:13 PM, "Stephan Ewen" <sewen@apache.org> wrote:
>> >
>> >> Yes, Gabor Gevay, that did refer to you!
>> >>
>> >> Sorry for the ambiguity...
>> >>
>> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
>> balassi.marton@gmail.com
>> >>>
>> >> wrote:
>> >>
>> >>> +1 for the proposal
>> >>> @ggevay: I do think that it refers to you. :)
>> >>>
>> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <ggab90@gmail.com>
>> wrote:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>> There are at least three Gábors in the Flink community,  :) so
>> >>>> assuming that the Gábor in the list of maintainers of the DataSet
API
>> >>>> is referring to me, I'll be happy to do it. :)
>> >>>>
>> >>>> Best,
>> >>>> Gábor G.
>> >>>>
>> >>>>
>> >>>>
>> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <sewen@apache.org>:
>> >>>>> Hi everyone!
>> >>>>>
>> >>>>> We propose to establish some lightweight structures in the Flink
open
>> >>>>> source community and development process,
>> >>>>> to help us better handle the increased interest in Flink (mailing
>> >> list
>> >>>> and
>> >>>>> pull requests), while not overwhelming the
>> >>>>> committers, and giving users and contributors a good experience.
>> >>>>>
>> >>>>> This proposal is triggered by the observation that we are reaching
>> >> the
>> >>>>> limits of where the current community can support
>> >>>>> users and guide new contributors. The below proposal is based
on
>> >>>>> observations and ideas from Till, Robert, and me.
>> >>>>>
>> >>>>> ========
>> >>>>> Goals
>> >>>>> ========
>> >>>>>
>> >>>>> We try to achieve the following
>> >>>>>
>> >>>>>   - Pull requests get handled in a timely fashion
>> >>>>>   - New contributors are better integrated into the community
>> >>>>>   - The community feels empowered on the mailing list.
>> >>>>>     But questions that need the attention of someone that has
deep
>> >>>>> knowledge of a certain part of Flink get their attention.
>> >>>>>   - At the same time, the committers that are knowledgeable
about
>> >> many
>> >>>> core
>> >>>>> parts do not get completely overwhelmed.
>> >>>>>   - We don't overlook threads that report critical issues.
>> >>>>>   - We always have a pretty good overview of what the status
of
>> >> certain
>> >>>>> parts of the system are.
>> >>>>>       -> What are often encountered known issues
>> >>>>>       -> What are the most frequently requested features
>> >>>>>
>> >>>>>
>> >>>>> ========
>> >>>>> Problems
>> >>>>> ========
>> >>>>>
>> >>>>> Looking into the process, there are two big issues:
>> >>>>>
>> >>>>> (1) Up to now, we have been relying on the fact that everything
just
>> >>>>> "organizes itself", driven by best effort. That assumes
>> >>>>> that everyone feels equally responsible for every part, question,
and
>> >>>>> contribution. At the current state, this is impossible
>> >>>>> to maintain, it overwhelms the committers and contributors.
>> >>>>>
>> >>>>> Example: Pull requests are picked up by whoever wants to pick
them
>> >> up.
>> >>>> Pull
>> >>>>> requests that are a lot of work, have little
>> >>>>> chance of getting in, or relate to less active components are
>> >> sometimes
>> >>>> not
>> >>>>> picked up. When contributors are pretty
>> >>>>> loaded already, it may happen that no one eventually feels
>> >> responsible
>> >>> to
>> >>>>> pick up a pull request, and it falls through the cracks.
>> >>>>>
>> >>>>> (2) There is no good overview of what are known shortcomings,
>> >> efforts,
>> >>>> and
>> >>>>> requested features for different parts of the system.
>> >>>>> This information exists in various peoples' heads, but is not
easily
>> >>>>> accessible for new people. The Flink JIRA is not well
>> >>>>> maintained, it is not easy to draw insights from that.
>> >>>>>
>> >>>>>
>> >>>>> ===========
>> >>>>> The Proposal
>> >>>>> ===========
>> >>>>>
>> >>>>> Since we are building a parallel system, the natural solution
seems
>> >> to
>> >>>> be:
>> >>>>> partition the workload ;-)
>> >>>>>
>> >>>>> We propose to define a set of components for Flink. Each component
is
>> >>>>> maintained or tracked by one or more
>> >>>>> people - let's call them maintainers. It is important to note
that we
>> >>>> don't
>> >>>>> suggest the maintainers as an authoritative role, but
>> >>>>> simply as committers or contributors that visibly step up for
a
>> >> certain
>> >>>>> component, and mainly track and drive the efforts
>> >>>>> pertaining to that component.
>> >>>>>
>> >>>>> It is also important to realize that we do not want to suggest
that
>> >>>> people
>> >>>>> get less involved with certain parts and components, because
>> >>>>> they are not the maintainers. We simply want to make sure that
each
>> >>> pull
>> >>>>> request or question or contribution has in the end
>> >>>>> one person (or a small set of people) responsible for catching
and
>> >>>> tracking
>> >>>>> it, if it was not worked on by the pro-active
>> >>>>> community.
>> >>>>>
>> >>>>> For some components, having multiple maintainers will be helpful.
In
>> >>> that
>> >>>>> case, one maintainer should be the "chair" or "lead"
>> >>>>> and make sure that no issue of that component gets lost between
the
>> >>>>> multiple maintainers.
>> >>>>>
>> >>>>>
>> >>>>> A maintainers' role is:
>> >>>>> -----------------------------
>> >>>>>
>> >>>>>   - Have an overview of which of the open pull requests relate
to
>> >> their
>> >>>>> component
>> >>>>>   - Drive the pull requests relating to the component to resolution
>> >>>>>       => Moderate the decision whether the feature should
be merged
>> >>>>>       => Make sure the pull request gets a shepherd.
>> >>>>>            In many cases, the maintainers would shepherd themselves.
>> >>>>>       => In case the shepherd becomes inactive, the maintainers
need
>> >> to
>> >>>>> find a new shepherd.
>> >>>>>
>> >>>>>   - Have an overview of what are the known issues of their component
>> >>>>>   - Have an overview of what are the frequently requested features
of
>> >>>> their
>> >>>>> component
>> >>>>>
>> >>>>>   - Have an overview of which contributors are doing very good
work
>> >> in
>> >>>>> their component,
>> >>>>>     would be candidates for committers, and should be mentored
>> >> towards
>> >>>> that.
>> >>>>>
>> >>>>>   - Resolve email threads that have been brought to their attention,
>> >>>>> because deeper
>> >>>>>     component knowledge is required for that thread.
>> >>>>>
>> >>>>> A maintainers' role is NOT:
>> >>>>> ----------------------------------
>> >>>>>
>> >>>>>   - Review all pull requests of that component
>> >>>>>   - Answer every mail with questions about that component
>> >>>>>   - Fix all bugs and implement all features of that components
>> >>>>>
>> >>>>>
>> >>>>> We imagine the following way that the community and the maintainers
>> >>>>> interact:
>> >>>>>
>> >>>>
>> >>>
>> >>
>> ---------------------------------------------------------------------------------------------------------
>> >>>>>
>> >>>>>   - Pull requests should be tagged by component. Since we cannot
add
>> >>>> labels
>> >>>>> at this point, we need
>> >>>>>     to rely on the following:
>> >>>>>      => The pull request opener should name the pull request
like
>> >>>>> "[FLINK-XXX] [component] Title"
>> >>>>>      => Components can be (re) tagged by adding special comments
in
>> >> the
>> >>>>> pull request ("==> component client")
>> >>>>>      => With some luck, GitHub and Apache Infra will allow
us to use
>> >>>> labels
>> >>>>> at some point
>> >>>>>
>> >>>>>   - When pull requests are associated with a component, the
>> >> maintainers
>> >>>>> will manage them
>> >>>>>     (decision whether to add, find shepherd, catch dropped pull
>> >>> requests)
>> >>>>>
>> >>>>>   - We assume that maintainers frequently reach out to other
>> >> community
>> >>>>> members and ask them if they want
>> >>>>>     to shepherd a pull request.
>> >>>>>
>> >>>>>   - On the mailing list, everyone should feel equally empowered
to
>> >>> answer
>> >>>>> and discuss.
>> >>>>>     If at some point in the discussion, some deep technical
knowledge
>> >>>> about
>> >>>>> a component is required,
>> >>>>>     the maintainer(s) should be drawn into the discussion.
>> >>>>>     Because the Mailing List infrastructure has no support to
tag
>> >>>> threads,
>> >>>>> here are some simple workarounds:
>> >>>>>
>> >>>>>     => One possibility is to put the maintainers' mail addresses
on
>> >> cc
>> >>>> for
>> >>>>> the thread, so they get the mail
>> >>>>>           not just via l the mailing list
>> >>>>>     => Another way would be to post something like "+maintainer
>> >>> runtime"
>> >>>> in
>> >>>>> the thread and the "runtime"
>> >>>>>          maintainers would have a filter/alert on these keywords
in
>> >>> their
>> >>>>> mail program.
>> >>>>>
>> >>>>>   - We assume that maintainers will reach out to community members
>> >> that
>> >>>> are
>> >>>>> very active and helpful in
>> >>>>>     a component, and will ask them if they want to be added
as
>> >>>> maintainers.
>> >>>>>     That will make it visible that those people are experts
for that
>> >>> part
>> >>>>> of Flink.
>> >>>>>
>> >>>>>
>> >>>>> ======================================
>> >>>>> Maintainers: Committers and Contributors
>> >>>>> ======================================
>> >>>>>
>> >>>>> It helps if maintainers are committers (since we want them to
resolve
>> >>>> pull
>> >>>>> requests which often involves
>> >>>>> merging them).
>> >>>>>
>> >>>>> Components with multiple maintainers can easily have non-committer
>> >>>>> contributors in addition to committer
>> >>>>> contributors.
>> >>>>>
>> >>>>>
>> >>>>> ======
>> >>>>> JIRA
>> >>>>> ======
>> >>>>>
>> >>>>> Ideally, JIRA can be used to get an overview of what are the
known
>> >>> issues
>> >>>>> of each component, and what are
>> >>>>> common feature requests. Unfortunately, the Flink JIRA is quite
>> >>>> unorganized
>> >>>>> right now.
>> >>>>>
>> >>>>> A natural followup effort of this proposal would be to define
in JIRA
>> >>> the
>> >>>>> same components as we defined here,
>> >>>>> and have the maintainers keep JIRA meaningful for that particular
>> >>>>> component. That would allow us to
>> >>>>> easily generate some tables out of JIRA (like top known issues
per
>> >>>>> component, most requested features)
>> >>>>> post them on the dev list once in a while as a "state of the
union"
>> >>>> report.
>> >>>>>
>> >>>>> Initial assignment of issues to components should be made by
those
>> >>> people
>> >>>>> opening the issue. The maintainer
>> >>>>> of that tagged component needs to change the tag, if the component
>> >> was
>> >>>>> classified incorrectly.
>> >>>>>
>> >>>>>
>> >>>>> ======================================
>> >>>>> Initial Components and Maintainers Suggestion
>> >>>>> ======================================
>> >>>>>
>> >>>>> Below is a suggestion of how to define components for Flink.
One goal
>> >>> of
>> >>>>> the division was to make it
>> >>>>> obvious for the majority of questions and contributions to which
>> >>>> component
>> >>>>> they would relate. Otherwise,
>> >>>>> if many contributions had fuzzy component associations, we would
>> >> again
>> >>>> not
>> >>>>> solve the issue of having clear
>> >>>>> responsibilities for who would track the progress and resolution.
>> >>>>>
>> >>>>> We also looked at each component and wrote the names of some
people
>> >> who
>> >>>> we
>> >>>>> thought were natural
>> >>>>> experts for the components, and thus natural candidates for
>> >>> maintainers.
>> >>>>>
>> >>>>> **These names are only a starting point for discussion.**
>> >>>>>
>> >>>>> Once agreed upon, the components and names of maintainers should
be
>> >>> kept
>> >>>> in
>> >>>>> the wiki and updated as
>> >>>>> components change and people step up or down.
>> >>>>>
>> >>>>>
>> >>>>> *DataSet API* (*Fabian, Greg, Gabor*)
>> >>>>>   - Incuding Hadoop compat. parts
>> >>>>>
>> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
>> >>>>>
>> >>>>> *Runtime*
>> >>>>>   - Distributed Coordination (JobManager/TaskManager, Akka)
 (*Till*)
>> >>>>>   - Local Runtime (Memory Management, State Backends,
>> >> Tasks/Operators)
>> >>> (
>> >>>>> *Stephan*)
>> >>>>>   - Network (*Ufuk*)
>> >>>>>
>> >>>>> *Client/Optimizer* (*Fabian*)
>> >>>>>
>> >>>>> *Type system / Type extractor* (Timo)
>> >>>>>
>> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
>> >>>>>
>> >>>>> *Libraries*
>> >>>>>   - Gelly (*Vasia, Greg*)
>> >>>>>   - ML (*Till, Theo*)
>> >>>>>   - CEP (*Till*)
>> >>>>>   - Python (*Chesnay*)
>> >>>>>
>> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
>> >>>>>
>> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
>> >>>>>
>> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
>> >>>>>
>> >>>>> *Storm Compatibility Layer* (*Mathias*)
>> >>>>>
>> >>>>> *Scala shell* (*Till*)
>> >>>>>
>> >>>>> *Startup Shell Scripts* (Ufuk)
>> >>>>>
>> >>>>> *Flink Build System, Maven Files* (*Robert*)
>> >>>>>
>> >>>>> *Documentation* (Ufuk)
>> >>>>>
>> >>>>>
>> >>>>> Please let us know what you think about this proposal.
>> >>>>> Happy discussing!
>> >>>>>
>> >>>>> Greetings,
>> >>>>> Stephan
>> >>>>
>> >>>
>> >>
>> >
>>
>>

Mime
View raw message