flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <mj...@apache.org>
Subject Re: [PROPOSAL] Structure the Flink Open Source Development
Date Thu, 12 May 2016 11:27:56 GMT
+1 from my side.

Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
it's me, even the correct spelling would be with two 't' :P)

-Matthias

On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> +1 for the proposal
> On May 12, 2016 12:13 PM, "Stephan Ewen" <sewen@apache.org> wrote:
> 
>> Yes, Gabor Gevay, that did refer to you!
>>
>> Sorry for the ambiguity...
>>
>> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <balassi.marton@gmail.com
>>>
>> wrote:
>>
>>> +1 for the proposal
>>> @ggevay: I do think that it refers to you. :)
>>>
>>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <ggab90@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> There are at least three Gábors in the Flink community,  :) so
>>>> assuming that the Gábor in the list of maintainers of the DataSet API
>>>> is referring to me, I'll be happy to do it. :)
>>>>
>>>> Best,
>>>> Gábor G.
>>>>
>>>>
>>>>
>>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <sewen@apache.org>:
>>>>> Hi everyone!
>>>>>
>>>>> We propose to establish some lightweight structures in the Flink open
>>>>> source community and development process,
>>>>> to help us better handle the increased interest in Flink (mailing
>> list
>>>> and
>>>>> pull requests), while not overwhelming the
>>>>> committers, and giving users and contributors a good experience.
>>>>>
>>>>> This proposal is triggered by the observation that we are reaching
>> the
>>>>> limits of where the current community can support
>>>>> users and guide new contributors. The below proposal is based on
>>>>> observations and ideas from Till, Robert, and me.
>>>>>
>>>>> ========
>>>>> Goals
>>>>> ========
>>>>>
>>>>> We try to achieve the following
>>>>>
>>>>>   - Pull requests get handled in a timely fashion
>>>>>   - New contributors are better integrated into the community
>>>>>   - The community feels empowered on the mailing list.
>>>>>     But questions that need the attention of someone that has deep
>>>>> knowledge of a certain part of Flink get their attention.
>>>>>   - At the same time, the committers that are knowledgeable about
>> many
>>>> core
>>>>> parts do not get completely overwhelmed.
>>>>>   - We don't overlook threads that report critical issues.
>>>>>   - We always have a pretty good overview of what the status of
>> certain
>>>>> parts of the system are.
>>>>>       -> What are often encountered known issues
>>>>>       -> What are the most frequently requested features
>>>>>
>>>>>
>>>>> ========
>>>>> Problems
>>>>> ========
>>>>>
>>>>> Looking into the process, there are two big issues:
>>>>>
>>>>> (1) Up to now, we have been relying on the fact that everything just
>>>>> "organizes itself", driven by best effort. That assumes
>>>>> that everyone feels equally responsible for every part, question, and
>>>>> contribution. At the current state, this is impossible
>>>>> to maintain, it overwhelms the committers and contributors.
>>>>>
>>>>> Example: Pull requests are picked up by whoever wants to pick them
>> up.
>>>> Pull
>>>>> requests that are a lot of work, have little
>>>>> chance of getting in, or relate to less active components are
>> sometimes
>>>> not
>>>>> picked up. When contributors are pretty
>>>>> loaded already, it may happen that no one eventually feels
>> responsible
>>> to
>>>>> pick up a pull request, and it falls through the cracks.
>>>>>
>>>>> (2) There is no good overview of what are known shortcomings,
>> efforts,
>>>> and
>>>>> requested features for different parts of the system.
>>>>> This information exists in various peoples' heads, but is not easily
>>>>> accessible for new people. The Flink JIRA is not well
>>>>> maintained, it is not easy to draw insights from that.
>>>>>
>>>>>
>>>>> ===========
>>>>> The Proposal
>>>>> ===========
>>>>>
>>>>> Since we are building a parallel system, the natural solution seems
>> to
>>>> be:
>>>>> partition the workload ;-)
>>>>>
>>>>> We propose to define a set of components for Flink. Each component is
>>>>> maintained or tracked by one or more
>>>>> people - let's call them maintainers. It is important to note that we
>>>> don't
>>>>> suggest the maintainers as an authoritative role, but
>>>>> simply as committers or contributors that visibly step up for a
>> certain
>>>>> component, and mainly track and drive the efforts
>>>>> pertaining to that component.
>>>>>
>>>>> It is also important to realize that we do not want to suggest that
>>>> people
>>>>> get less involved with certain parts and components, because
>>>>> they are not the maintainers. We simply want to make sure that each
>>> pull
>>>>> request or question or contribution has in the end
>>>>> one person (or a small set of people) responsible for catching and
>>>> tracking
>>>>> it, if it was not worked on by the pro-active
>>>>> community.
>>>>>
>>>>> For some components, having multiple maintainers will be helpful. In
>>> that
>>>>> case, one maintainer should be the "chair" or "lead"
>>>>> and make sure that no issue of that component gets lost between the
>>>>> multiple maintainers.
>>>>>
>>>>>
>>>>> A maintainers' role is:
>>>>> -----------------------------
>>>>>
>>>>>   - Have an overview of which of the open pull requests relate to
>> their
>>>>> component
>>>>>   - Drive the pull requests relating to the component to resolution
>>>>>       => Moderate the decision whether the feature should be merged
>>>>>       => Make sure the pull request gets a shepherd.
>>>>>            In many cases, the maintainers would shepherd themselves.
>>>>>       => In case the shepherd becomes inactive, the maintainers need
>> to
>>>>> find a new shepherd.
>>>>>
>>>>>   - Have an overview of what are the known issues of their component
>>>>>   - Have an overview of what are the frequently requested features of
>>>> their
>>>>> component
>>>>>
>>>>>   - Have an overview of which contributors are doing very good work
>> in
>>>>> their component,
>>>>>     would be candidates for committers, and should be mentored
>> towards
>>>> that.
>>>>>
>>>>>   - Resolve email threads that have been brought to their attention,
>>>>> because deeper
>>>>>     component knowledge is required for that thread.
>>>>>
>>>>> A maintainers' role is NOT:
>>>>> ----------------------------------
>>>>>
>>>>>   - Review all pull requests of that component
>>>>>   - Answer every mail with questions about that component
>>>>>   - Fix all bugs and implement all features of that components
>>>>>
>>>>>
>>>>> We imagine the following way that the community and the maintainers
>>>>> interact:
>>>>>
>>>>
>>>
>> ---------------------------------------------------------------------------------------------------------
>>>>>
>>>>>   - Pull requests should be tagged by component. Since we cannot add
>>>> labels
>>>>> at this point, we need
>>>>>     to rely on the following:
>>>>>      => The pull request opener should name the pull request like
>>>>> "[FLINK-XXX] [component] Title"
>>>>>      => Components can be (re) tagged by adding special comments in
>> the
>>>>> pull request ("==> component client")
>>>>>      => With some luck, GitHub and Apache Infra will allow us to use
>>>> labels
>>>>> at some point
>>>>>
>>>>>   - When pull requests are associated with a component, the
>> maintainers
>>>>> will manage them
>>>>>     (decision whether to add, find shepherd, catch dropped pull
>>> requests)
>>>>>
>>>>>   - We assume that maintainers frequently reach out to other
>> community
>>>>> members and ask them if they want
>>>>>     to shepherd a pull request.
>>>>>
>>>>>   - On the mailing list, everyone should feel equally empowered to
>>> answer
>>>>> and discuss.
>>>>>     If at some point in the discussion, some deep technical knowledge
>>>> about
>>>>> a component is required,
>>>>>     the maintainer(s) should be drawn into the discussion.
>>>>>     Because the Mailing List infrastructure has no support to tag
>>>> threads,
>>>>> here are some simple workarounds:
>>>>>
>>>>>     => One possibility is to put the maintainers' mail addresses on
>> cc
>>>> for
>>>>> the thread, so they get the mail
>>>>>           not just via l the mailing list
>>>>>     => Another way would be to post something like "+maintainer
>>> runtime"
>>>> in
>>>>> the thread and the "runtime"
>>>>>          maintainers would have a filter/alert on these keywords in
>>> their
>>>>> mail program.
>>>>>
>>>>>   - We assume that maintainers will reach out to community members
>> that
>>>> are
>>>>> very active and helpful in
>>>>>     a component, and will ask them if they want to be added as
>>>> maintainers.
>>>>>     That will make it visible that those people are experts for that
>>> part
>>>>> of Flink.
>>>>>
>>>>>
>>>>> ======================================
>>>>> Maintainers: Committers and Contributors
>>>>> ======================================
>>>>>
>>>>> It helps if maintainers are committers (since we want them to resolve
>>>> pull
>>>>> requests which often involves
>>>>> merging them).
>>>>>
>>>>> Components with multiple maintainers can easily have non-committer
>>>>> contributors in addition to committer
>>>>> contributors.
>>>>>
>>>>>
>>>>> ======
>>>>> JIRA
>>>>> ======
>>>>>
>>>>> Ideally, JIRA can be used to get an overview of what are the known
>>> issues
>>>>> of each component, and what are
>>>>> common feature requests. Unfortunately, the Flink JIRA is quite
>>>> unorganized
>>>>> right now.
>>>>>
>>>>> A natural followup effort of this proposal would be to define in JIRA
>>> the
>>>>> same components as we defined here,
>>>>> and have the maintainers keep JIRA meaningful for that particular
>>>>> component. That would allow us to
>>>>> easily generate some tables out of JIRA (like top known issues per
>>>>> component, most requested features)
>>>>> post them on the dev list once in a while as a "state of the union"
>>>> report.
>>>>>
>>>>> Initial assignment of issues to components should be made by those
>>> people
>>>>> opening the issue. The maintainer
>>>>> of that tagged component needs to change the tag, if the component
>> was
>>>>> classified incorrectly.
>>>>>
>>>>>
>>>>> ======================================
>>>>> Initial Components and Maintainers Suggestion
>>>>> ======================================
>>>>>
>>>>> Below is a suggestion of how to define components for Flink. One goal
>>> of
>>>>> the division was to make it
>>>>> obvious for the majority of questions and contributions to which
>>>> component
>>>>> they would relate. Otherwise,
>>>>> if many contributions had fuzzy component associations, we would
>> again
>>>> not
>>>>> solve the issue of having clear
>>>>> responsibilities for who would track the progress and resolution.
>>>>>
>>>>> We also looked at each component and wrote the names of some people
>> who
>>>> we
>>>>> thought were natural
>>>>> experts for the components, and thus natural candidates for
>>> maintainers.
>>>>>
>>>>> **These names are only a starting point for discussion.**
>>>>>
>>>>> Once agreed upon, the components and names of maintainers should be
>>> kept
>>>> in
>>>>> the wiki and updated as
>>>>> components change and people step up or down.
>>>>>
>>>>>
>>>>> *DataSet API* (*Fabian, Greg, Gabor*)
>>>>>   - Incuding Hadoop compat. parts
>>>>>
>>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
>>>>>
>>>>> *Runtime*
>>>>>   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
>>>>>   - Local Runtime (Memory Management, State Backends,
>> Tasks/Operators)
>>> (
>>>>> *Stephan*)
>>>>>   - Network (*Ufuk*)
>>>>>
>>>>> *Client/Optimizer* (*Fabian*)
>>>>>
>>>>> *Type system / Type extractor* (Timo)
>>>>>
>>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
>>>>>
>>>>> *Libraries*
>>>>>   - Gelly (*Vasia, Greg*)
>>>>>   - ML (*Till, Theo*)
>>>>>   - CEP (*Till*)
>>>>>   - Python (*Chesnay*)
>>>>>
>>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
>>>>>
>>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
>>>>>
>>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
>>>>>
>>>>> *Storm Compatibility Layer* (*Mathias*)
>>>>>
>>>>> *Scala shell* (*Till*)
>>>>>
>>>>> *Startup Shell Scripts* (Ufuk)
>>>>>
>>>>> *Flink Build System, Maven Files* (*Robert*)
>>>>>
>>>>> *Documentation* (Ufuk)
>>>>>
>>>>>
>>>>> Please let us know what you think about this proposal.
>>>>> Happy discussing!
>>>>>
>>>>> Greetings,
>>>>> Stephan
>>>>
>>>
>>
> 


Mime
View raw message