spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Spark Improvement Proposals
Date Sun, 09 Oct 2016 22:10:14 GMT
Users instead of people, sure.  Commiters and contributors are (or at least
should be) a subset of users.

Non goals, sure. I don't care what the name is, but we need to clearly say
e.g. 'no we are not maintaining compatibility with XYZ right now'.

API, what I care most about is whether it allows me to accomplish the
goals. Arguing about how ugly or pretty it is can be saved for design/
implementation imho.

Strategy, this is necessary because otherwise goals can be out of line with
reality.  Don't propose goals you don't have at least some idea of how to
implement.

Rejected strategies, given that commiters are the only ones I'm saying
should formally submit SPARKLIs or SIPs, if they put junk in a required
section then slap them down for it and tell them to fix it.

On Oct 9, 2016 4:36 PM, "Matei Zaharia" <matei.zaharia@gmail.com> wrote:

> Yup, this is the stuff that I found unclear. Thanks for clarifying here,
> but we should also clarify it in the writeup. In particular:
>
> - Goals needs to be about user-facing behavior ("people" is broad)
>
> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will dig up
> one of these and say "Spark's developers have officially rejected X, which
> our awesome system has".
>
> - For user-facing stuff, I think you need a section on API. Virtually all
> other *IPs I've seen have that.
>
> - I'm still not sure why the strategy section is needed if the purpose is
> to define user-facing behavior -- unless this is the strategy for setting
> the goals or for defining the API. That sounds squarely like a design doc
> issue. In some sense, who cares whether the proposal is technically
> feasible right now? If it's infeasible, that will be discovered later
> during design and implementation. Same thing with rejected strategies --
> listing some of those is definitely useful sometimes, but if you make this
> a *required* section, people are just going to fill it in with bogus stuff
> (I've seen this happen before).
>
> Matei
>
> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <cody@koeninger.org> wrote:
> >
> > So to focus the discussion on the specific strategy I'm suggesting,
> > documented at
> >
> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-
> improvement-proposals.md
> >
> > "Goals: What must this allow people to do, that they can't currently?"
> >
> > Is it unclear that this is focusing specifically on people-visible
> behavior?
> >
> > Rejected goals -  are important because otherwise people keep trying
> > to argue about scope.  Of course you can change things later with a
> > different SIP and different vote, the point is to focus.
> >
> > Use cases - are something that people are going to bring up in
> > discussion.  If they aren't clearly documented as a goal ("This must
> > allow me to connect using SSL"), they should be added.
> >
> > Internal architecture - if the people who need specific behavior are
> > implementers of other parts of the system, that's fine.
> >
> > Rejected strategies - If you have none of these, you have no evidence
> > that the proponent didn't just go with the first thing they had in
> > mind (or have already implemented), which is a big problem currently.
> > Approval isn't binding as to specifics of implementation, so these
> > aren't handcuffs.  The goals are the contract, the strategy is
> > evidence that contract can actually be met.
> >
> > Design docs - I'm not touching design docs.  The markdown file I
> > linked specifically says of the strategy section "This is not a full
> > design document."  Is this unclear?  Design docs can be worked on
> > obviously, but that's not what I'm concerned with here.
> >
> >
> >
> >
> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <matei.zaharia@gmail.com>
> wrote:
> >> Hi Cody,
> >>
> >> I think this would be a lot more concrete if we had a more detailed
> template
> >> for SIPs. Right now, it's not super clear what's in scope -- e.g. are
> they
> >> a way to solicit feedback on the user-facing behavior or on the
> internals?
> >> "Goals" can cover both things. I've been thinking of SIPs more as
> Product
> >> Requirements Docs (PRDs), which focus on *what* a code change should do
> as
> >> opposed to how.
> >>
> >> In particular, here are some things that you may or may not consider in
> >> scope for SIPs:
> >>
> >> - Goals and non-goals: This is definitely in scope, and IMO should
> focus on
> >> user-visible behavior (e.g. "system supports SQL window functions" or
> >> "system continues working if one node fails"). BTW I wouldn't say
> "rejected
> >> goals" because some of them might become goals later, so we're not
> >> definitively rejecting them.
> >>
> >> - Public API: Probably should be included in most SIPs unless it's too
> large
> >> to fully specify then (e.g. "let's add an ML library").
> >>
> >> - Use cases: I usually find this very useful in PRDs to better
> communicate
> >> the goals.
> >>
> >> - Internal architecture: This is usually *not* a thing users can easily
> >> comment on and it sounds more like a design doc item. Of course it's
> >> important to show that the SIP is feasible to implement. One exception,
> >> however, is that I think we'll have some SIPs primarily on internals
> (e.g.
> >> if somebody wants to refactor Spark's query optimizer or something).
> >>
> >> - Rejected strategies: I personally wouldn't put this, because what's
> the
> >> point of voting to reject a strategy before you've really begun
> designing
> >> and implementing something? What if you discover that the strategy is
> >> actually better when you start doing stuff?
> >>
> >> At a super high level, it depends on whether you want the SIPs to be
> PRDs
> >> for getting some quick feedback on the goals of a feature before it is
> >> designed, or something more like full-fledged design docs (just a more
> >> visible design doc for bigger changes). I looked at Kafka's KIPs, and
> they
> >> actually seem to be more like design docs. This can work too but it does
> >> require more work from the proposer and it can lead to the same
> problems you
> >> mentioned with people already having a design and implementation in
> mind.
> >>
> >> Basically, the question is, are you trying to iterate faster on design
> by
> >> adding a step for user feedback earlier? Or are you just trying to make
> >> design docs for key features more visible (and their approval more
> formal)?
> >>
> >> BTW note that in either case, I'd like to have a template for design
> docs
> >> too, which should also include goals. I think that would've avoided
> some of
> >> the issues you brought up.
> >>
> >> Matei
> >>
> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <cody@koeninger.org> wrote:
> >>
> >> Here's my specific proposal (meta-proposal?)
> >>
> >> Spark Improvement Proposals (SIP)
> >>
> >>
> >> Background:
> >>
> >> The current problem is that design and implementation of large features
> are
> >> often done in private, before soliciting user feedback.
> >>
> >> When feedback is solicited, it is often as to detailed design
> specifics, not
> >> focused on goals.
> >>
> >> When implementation does take place after design, there is often
> >> disagreement as to what goals are or are not in scope.
> >>
> >> This results in commits that don't fully meet user needs.
> >>
> >>
> >> Goals:
> >>
> >> - Ensure user, contributor, and committer goals are clearly identified
> and
> >> agreed upon, before implementation takes place.
> >>
> >> - Ensure that a technically feasible strategy is chosen that is likely
> to
> >> meet the goals.
> >>
> >>
> >> Rejected Goals:
> >>
> >> - SIPs are not for detailed design.  Design by committee doesn't work.
> >>
> >> - SIPs are not for every change.  We dont need that much process.
> >>
> >>
> >> Strategy:
> >>
> >> My suggestion is outlined as a Spark Improvement Proposal process
> documented
> >> at
> >>
> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-
> improvement-proposals.md
> >>
> >> Specifics of Jira manipulation are an implementation detail we can
> figure
> >> out.
> >>
> >> I'm suggesting voting; the need here is for a _clear_ outcome.
> >>
> >>
> >> Rejected Strategies:
> >>
> >> Having someone who understands the problem implement it first works, but
> >> only if significant iteration after user feedback is allowed.
> >>
> >> Historically this has been problematic due to pressure to limit public
> api
> >> changes.
> >>
> >>
> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <rxin@databricks.com>
> wrote:
> >>>
> >>> Alright looks like there are quite a bit of support. We should wait to
> >>> hear from more people too.
> >>>
> >>> To push this forward, Cody and I will be working together in the next
> >>> couple of weeks to come up with a concrete, detailed proposal on what
> this
> >>> entails, and then we can discuss this the specific proposal as well.
> >>>
> >>>
> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <cody@koeninger.org>
> wrote:
> >>>>
> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for major
> >>>> user-facing or cross-cutting changes, not minor feature adds.
> >>>>
> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos
> >>>> <stavros.kontopoulos@lightbend.com> wrote:
> >>>>>
> >>>>> +1 to the SIP label as long as it does not slow down things and
it
> >>>>> targets optimizing efforts, coordination etc. For example really
> small
> >>>>> features should not need to go through this process (assuming they
> dont
> >>>>> touch public interfaces)  or re-factorings and hope it will be kept
> this
> >>>>> way. So as a guideline doc should be provided, like in the KIP case.
> >>>>>
> >>>>> IMHO so far aside from tagging things and linking them elsewhere
> simply
> >>>>> having design docs and prototypes implementations in PRs is not
> something
> >>>>> that has not worked so far. What is really a pain in many projects
> out there
> >>>>> is discontinuity in progress of PRs, missing features, slow reviews
> which is
> >>>>> understandable to some extent... it is not only about Spark but
> things can
> >>>>> be improved for sure for this project in particular as already
> stated.
> >>>>>
> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <cody@koeninger.org>
> >>>>> wrote:
> >>>>>>
> >>>>>> +1 to adding an SIP label and linking it from the website. 
I think
> it
> >>>>>> needs
> >>>>>>
> >>>>>> - template that focuses it towards soliciting user goals / non
goals
> >>>>>> - clear resolution as to which strategy was chosen to pursue.
 I'd
> >>>>>> recommend a vote.
> >>>>>>
> >>>>>> Matei asked me to clarify what I meant by changing interfaces,
I
> think
> >>>>>> it's directly relevant to the SIP idea so I'll clarify here,
and
> split
> >>>>>> a thread for the other discussion per Nicholas' request.
> >>>>>>
> >>>>>> I meant changing public user interfaces.  I think the first
design
> is
> >>>>>> unlikely to be right, because it's done at a time when you have
the
> >>>>>> least information.  As a user, I find it considerably more
> frustrating
> >>>>>> to be unable to use a tool to get my job done, than I do having
to
> >>>>>> make minor changes to my code in order to take advantage of
> features.
> >>>>>> I've seen committers be seriously reluctant to allow changes
to
> >>>>>> @experimental code that are needed in order for it to really
work
> >>>>>> right.  You need to be able to iterate, and if people on both
sides
> of
> >>>>>> the fence aren't going to respect that some newer apis are subject
> to
> >>>>>> change, then why even mark them as such?
> >>>>>>
> >>>>>> Ideally a finished SIP should give me a checklist of things
that an
> >>>>>> implementation must do, and things that it doesn't need to do.
> >>>>>> Contributors/committers should be seriously discouraged from
putting
> >>>>>> out a version 0.1 that doesn't have at least a prototype
> >>>>>> implementation of all those things, especially if they're then
going
> >>>>>> to argue against interface changes necessary to get the the
rest of
> >>>>>> the things done in the 0.2 version.
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <rxin@databricks.com>
> >>>>>> wrote:
> >>>>>>> I like the lightweight proposal to add a SIP label.
> >>>>>>>
> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested
using
> wiki
> >>>>>>> to
> >>>>>>> track the list of major changes, but that never really materialized
> >>>>>>> due to
> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then
link to
> them
> >>>>>>> prominently on the Spark website makes a lot of sense.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia
> >>>>>>> <matei.zaharia@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> For the improvement proposals, I think one major point
was to make
> >>>>>>>> them
> >>>>>>>> really visible to users who are not contributors, so
we should do
> >>>>>>>> more than
> >>>>>>>> sending stuff to dev@. One very lightweight idea is
to have a new
> >>>>>>>> type of
> >>>>>>>> JIRA called a SIP and have a link to a filter that shows
all such
> >>>>>>>> JIRAs from
> >>>>>>>> http://spark.apache.org. I also like the idea of SIP
and design
> doc
> >>>>>>>> templates (in fact many projects have them).
> >>>>>>>>
> >>>>>>>> Matei
> >>>>>>>>
> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <rxin@databricks.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> I called Cody last night and talked about some of the
topics in
> his
> >>>>>>>> email.
> >>>>>>>> It became clear to me Cody genuinely cares about the
project.
> >>>>>>>>
> >>>>>>>> Some of the frustrations come from the success of the
project
> itself
> >>>>>>>> becoming very "hot", and it is difficult to get clarity
from
> people
> >>>>>>>> who
> >>>>>>>> don't dedicate all their time to Spark. In fact, it
is in some
> ways
> >>>>>>>> similar
> >>>>>>>> to scaling an engineering team in a successful startup:
old
> >>>>>>>> processes that
> >>>>>>>> worked well might not work so well when it gets to a
certain size,
> >>>>>>>> cultures
> >>>>>>>> can get diluted, building culture vs building process,
etc.
> >>>>>>>>
> >>>>>>>> I also really like to have a more visible process for
larger
> >>>>>>>> changes,
> >>>>>>>> especially major user facing API changes. Historically
we upload
> >>>>>>>> design docs
> >>>>>>>> for major changes, but it is not always consistent and
difficult
> to
> >>>>>>>> quality
> >>>>>>>> of the docs, due to the volunteering nature of the organization.
> >>>>>>>>
> >>>>>>>> Some of the more concrete ideas we discussed focus on
building a
> >>>>>>>> culture
> >>>>>>>> to improve clarity:
> >>>>>>>>
> >>>>>>>> - Process: Large changes should have design docs posted
on JIRA.
> One
> >>>>>>>> thing
> >>>>>>>> Cody and I didn't discuss but an idea that just came
to me is we
> >>>>>>>> should
> >>>>>>>> create a design doc template for the project and ask
everybody to
> >>>>>>>> follow.
> >>>>>>>> The design doc template should also explicitly list
goals and
> >>>>>>>> non-goals, to
> >>>>>>>> make design doc more consistent.
> >>>>>>>>
> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some
this with
> >>>>>>>> some
> >>>>>>>> changes, but again very inconsistent. Just posting something
on
> JIRA
> >>>>>>>> isn't
> >>>>>>>> sufficient, because there are simply too many JIRAs
and the signal
> >>>>>>>> get lost
> >>>>>>>> in the noise. While this is generally impossible to
enforce
> because
> >>>>>>>> we can't
> >>>>>>>> force all volunteers to conform to a process (or they
might not
> even
> >>>>>>>> be
> >>>>>>>> aware of this),  those who are more familiar with the
project can
> >>>>>>>> help by
> >>>>>>>> emailing the dev@ when they see something that hasn't
been.
> >>>>>>>>
> >>>>>>>> - Culture: The design doc author(s) should be open to
feedback. A
> >>>>>>>> design
> >>>>>>>> doc should serve as the base for discussion and is by
no means the
> >>>>>>>> final
> >>>>>>>> design. Of course, this does not mean the author has
to accept
> every
> >>>>>>>> feedback. They should also be comfortable accepting
/ rejecting
> >>>>>>>> ideas on
> >>>>>>>> technical grounds.
> >>>>>>>>
> >>>>>>>> - Process / Culture: For major ongoing projects, it
can be useful
> to
> >>>>>>>> have
> >>>>>>>> some monthly Google hangouts that are open to the world.
I am
> >>>>>>>> actually not
> >>>>>>>> sure how well this will work, because of the volunteering
nature
> and
> >>>>>>>> we need
> >>>>>>>> to adjust for timezones for people across the globe,
but it seems
> >>>>>>>> worth
> >>>>>>>> trying.
> >>>>>>>>
> >>>>>>>> - Culture: Contributors (including committers) should
be more
> direct
> >>>>>>>> in
> >>>>>>>> setting expectations, including whether they are working
on a
> >>>>>>>> specific
> >>>>>>>> issue, whether they will be working on a specific issue,
and
> whether
> >>>>>>>> an
> >>>>>>>> issue or pr or jira should be rejected. Most people
I know in this
> >>>>>>>> community
> >>>>>>>> are nice and don't enjoy telling other people no, but
it is often
> >>>>>>>> more
> >>>>>>>> annoying to a contributor to not know anything than
getting a no.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia
> >>>>>>>> <matei.zaharia@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Love the idea of a more visible "Spark Improvement
Proposal"
> >>>>>>>>> process that
> >>>>>>>>> solicits user input on new APIs. For what it's worth,
I don't
> think
> >>>>>>>>> committers are trying to minimize their own work
-- every
> committer
> >>>>>>>>> cares
> >>>>>>>>> about making the software useful for users. However,
it is always
> >>>>>>>>> hard to
> >>>>>>>>> get user input and so it helps to have this kind
of process. I've
> >>>>>>>>> certainly
> >>>>>>>>> looked at the *IPs a lot in other software I use
just to see the
> >>>>>>>>> biggest
> >>>>>>>>> things on the roadmap.
> >>>>>>>>>
> >>>>>>>>> When you're talking about "changing interfaces",
are you talking
> >>>>>>>>> about
> >>>>>>>>> public or internal APIs? I do think many people
hate changing
> >>>>>>>>> public APIs
> >>>>>>>>> and I actually think that's for the best of the
project. That's a
> >>>>>>>>> technical
> >>>>>>>>> debate, but basically, the worst thing when you're
using a piece
> of
> >>>>>>>>> software
> >>>>>>>>> is that the developers constantly ask you to rewrite
your app to
> >>>>>>>>> update to a
> >>>>>>>>> new version (and thus benefit from bug fixes, etc).
Cue anyone
> >>>>>>>>> who's used
> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change
their code
> >>>>>>>>> this
> >>>>>>>>> release" model works well within a single large
company, but
> >>>>>>>>> doesn't work
> >>>>>>>>> well for a community, which is why nearly all *very*
widely used
> >>>>>>>>> programming
> >>>>>>>>> interfaces (I'm talking things like Java standard
library,
> Windows
> >>>>>>>>> API, etc)
> >>>>>>>>> almost *never* break backwards compatibility. All
this is done
> >>>>>>>>> within reason
> >>>>>>>>> though, e.g. we do change things in major releases
(2.x, 3.x,
> etc).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> ------------------------------------------------------------
> ---------
> >>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Stavros Kontopoulos
> >>>>> Senior Software Engineer
> >>>>> Lightbend, Inc.
> >>>>> p:  +30 6977967274
> >>>>> e: stavros.kontopoulos@lightbend.com
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
>
>

Mime
View raw message