beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davor Bonaci <da...@google.com.INVALID>
Subject Re: [REMINDER] Technical discussion on the mailing list
Date Thu, 06 Oct 2016 06:29:59 GMT
Daniel, so glad you are starting to contribute to Beam! It was great
talking with you in person back in May. Welcome!

--

There are lots of different things mentioned here; I'll try to address them
separately.

The first use of AutoValue should have been discussed on the dev@ mailing
list. I think the main reason for the discussion is a bit different --
AutoValue has a non-trivial tradeoff -- compile complexity vs. boilerplate
code. For example, AutoValue may degrade IDE experience for some
contributors. If we'd go in depth on this, I'm sure we'd find opposing
opinions on the use of AutoValue. This tradeoff should have been discussed
on the dev@ list, followed by a community decision.

Note that this has happened *before* the JdbcIO work. Since AutoValue has
been already used elsewhere in the project, there was no real reason not to
use it in JdbcIO, as appropriate. Therefore, I think JB and Eugene did
everything right! Second, third or thousandth usage of a concept doesn't
require any particular discussion. They didn't make anything worse. Their
discussion is totally appropriate for code review.

Now, as Daniel points out, I think it is not right to ask a contributor to
change his PR to use AutoValue when none of the existing IO connectors use
it. This is making a too high standard. In fact, it is desirable for new
contributions to follow already established patterns, instead of inventing
something new. If we want to change a pattern, we should do it as a
separate effort across the board.

On the other hand, dev@ discussion wouldn't have helped to prevent review
comments/discussions. Let's say we have had the discussion, and a new
contributor comes a year later. Should we ask her to read all discussions
that ever happened in the project to learn everything she might need? Of
course not! She should follow already established patterns and learn any
specifics during code review. And then, best practices should be documented
on the website.

To summarize, a few things could have been better:
* Discussion of the first use of AutoValue on dev@.
* Avoiding overzealous core review comments.
* Changing a pattern should have been done by filing several starter tasks
in JIRA.

--

There are also several different proposals for altering a part of the
workflow.

> code review comments not making to a list

We have >1000 PRs so far, with at least a dozen comments on average, with
pace increasing. This is >10,000 emails, most of which are "fix a typo".
This leads into information overload, with actual information being missed.

If someone wants this extra information -- just clicking the Watch button
in the GitHub UI will make it happen!

> creating new JIRA and opening PR to dev@

These currently go to commits@. This would have resulted in another 1,700
email threads compared to <150 now.

Generally speaking, *all* of this is already available to anyone who wants
to receive it. However, anyone I know that has tried, has given up very
quickly ;). If anybody is concerned, we can create several new lists for
this traffic -- but we shouldn't repurpose dev@ for it.

> I feel I’m missing things as there is significant amount of things not
happening on a list

I think "feeling of missing things" is totally valid. I feel that too, as
well as almost everybody else.

My best answer is -- we should realize that we are an extremely large and
complex project, with >100 contributors and >20 people working on it full
time. Nobody can follow every SDK, every runner, every IO connector, every
pull request, every comment that all contributors make each and every day.

While nobody can follow everything, everything is being followed by
multiple people. And, we need to be accountable to each other to surface
everything relevant to the dev@ list. And I believe that is already
happening the vast majority of time. This is just one example where it
didn't happen.

--

All that said, there are certainly areas for improvement. If anyone has
specific ideas, please reach out! I'd love to discuss them in detail and
propose improvements to the wider community.

Thanks!


On Wed, Oct 5, 2016 at 6:16 PM, Thomas Weise <thw@apache.org> wrote:
>
> How about sending just the notifications for creating new JIRA and opening
> PR to dev@ so that those that are interested can start watching?
>
> Thanks,
> Thomas
>
> On Wed, Oct 5, 2016 at 5:33 PM, Dan Halperin <dhalperi@google.com.invalid>
> wrote:
>
> > On Wed, Oct 5, 2016 at 5:13 PM, Daniel Kulp <dkulp@apache.org> wrote:
> >
> > > I just want to give a little more context to this….  I’ve been
lurking on
> > > this list for several months now reading everything that’s going on.
> >  From
> > > Apache’s standpoint, that should be a “very good start” for getting to
> > know
> > > what is happening in a project.
> > >
> > > On my last PR, Eugene commented about using the AutoValue pattern for
> > part
> > > of it which caught me off guard.   None of the other IO’s in master
were
> > > using it, there wasn’t any discussion on this list about it, I had no
> > idea
> > > what it was about.   So I asked JB to make sure I hadn’t missed
anything.
> > >
> >
> > > Anyway, this is one of the main concerns I have with Beam’s PR work
flow,
> > > I feel I’m missing things as there is significant amount of things not
> > > happening on a list.   The initial pull request is going to the
commits
> > > list (ok, would prefer the dev list, but at least its on a list).
> > However,
> > > none of the comments or discussions or anything that is occurring as
part
> > > of the review is making it to any list.   The only people that “learn”
> > from
> > > the reviews are the reviewers and the person who initiated the PR
unless
> > > they go into each and every PR and read the comments (and find the
news
> > > ones and such).    With my Apache hat on, this bothers me.
> >
> >
> > Anyway, I don’t really understand why the full github integration wasn’t
> > > setup for the beam PR’s so that the comments would come back to the
lists
> > > as well (and JIRA, BTW).
> > >
> >
> > This part confuses me. We've been told that discussions on JIRA, even
> > though they are emailed to the mailing lists, don't count as happening
on
> > the mailing list. So why would github integration be helpful vs just
more
> > spam?
> >
> > As another example, the comments on PR1003 are very applicable to anyone
> > > looking into writing IO’s and they could learn about some of the “best
> > > practices” presented there.
> >
> >
> > As Beam grows during its incubation, we are moving a lot of knowledge to
> > documentation, but yes -- right now, most of the I/O related practices
live
> > in Eugene's and my head (and now, JB's!). We're working on it, and hope
to
> > dramatically improve documentation for source authors in the next
quarter.
> >
> > For AutoValue specifically, this is by no means codified and it is
> > DEFINITELY not mandatory. Eugene and JB just experimented with it in the
> > last few days and decided it was useful in a few cases. We do (or did,
> > before this thread) need to have an actual discussion on the mailing
list
> > before moving forward further towards making it policy.
> >
> > Right now Ben Chambers is trying to apply AutoValue in places that need
> > templated types and struggling with multiple ?s, so the discussion may
need
> > to continue! ...
> >
> > Thanks,
> > Dan
> >
> > That’s basically the background as to why JB sent this.  I was confused
and
> > > bugged him.   :-)
> > >
> > > Dan
> > >
> > >
> > >
> > > > On Oct 5, 2016, at 1:51 PM, Jean-Baptiste Onofré <jb@nanthrax.net>
> > > wrote:
> > > >
> > > > Hi team,
> > > >
> > > > I would like to excuse myself to have forgotten to discuss and share
> > > with you a technical point and generally speaking do a small reminder.
> > > >
> > > > When we work with Eugene on the JdbcIO, we experimented AutoValue to
> > > deal with IO configuration. AutoValue provides a nice way to reduce
and
> > > limit the boilerplate code required by the IO configuration.
> > > > We used AutoValue in JdbcIO and, regarding the good improvements we
> > saw,
> > > we started to refactor the other IOs.
> > > >
> > > > The use of AutoValue should have been notice and discussed on the
> > > mailing list.
> > > >
> > > > "If it doesn't exist on the mailing list, it doesn't exist at all."
> > > >
> > > > So, any comment happening on a GitHub pull request, or discussion on
> > > hangouts which can impact the project (generally speaking) has to
happen
> > on
> > > the mailing list.
> > > >
> > > > It provides project transparency and facilitates the new
contribution
> > > onboarding.
> > > >
> > > > Thanks !
> > > >
> > > > Regards
> > > > JB
> > > >
> > >
> > > --
> > > Daniel Kulp
> > > dkulp@apache.org <mailto:dkulp@apache.org> - http://dankulp.com/blog
<
> > > http://dankulp.com/blog>
> > > Talend Community Coder - http://coders.talend.com <
> > > http://coders.talend.com/>
> > >
> >

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message