hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthik Kambatla <ka...@cloudera.com>
Subject Re: [DISCUSS] Increased use of feature branches
Date Fri, 10 Jun 2016 15:28:15 GMT

On Fri, Jun 10, 2016 at 6:56 AM, Junping Du <jdu@hortonworks.com> wrote:

> Comparing with advantages, I believe the disadvantages of shipping any
> releases directly from trunk are more obvious and significant:
> - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> to wait to commit to trunk or put into a separated branch that could delay
> feature development progress as additional vote process get involved even
> the feature is simple and harmless.

Including these sorts of commits in trunk is a major pain.

One example from a recent mistake I made:
YARN-2877 and YARN-1011 had some common changes. Instead of putting them in
a separate branch, I committed these common changes to trunk because well
we don't release from trunk and what can go wrong. After a few days, other
contributors and committers started feeling annoyed about having to submit
two different patches for trunk and branch-2. This inconvenience led to
those patches being pulled into branch-2 even though they were not ready
for inclusion in branch-2 or a 2.x release.

I feel the major friction for feature branches comes from only some
features using it. If everyone uses feature branches and we have better
processes around quantifying the stability of a feature branch, feature
branches should make for a smoother experience for everyone.

It is not uncommon for features to get merged into trunk before being ready
with promises of follow-up work. While that might very well be the intent
of contributors, other work items come up and things get sidelined. How
often have we seen features without HA and security.

> - These commits left in separated branches are isolated and get more
> chance to conflict each other, and more bugs could be involved due to
> conflicts and/or less eyes watching/bless on isolated branches.

Partially agree. There is a tradeoff here: if we keep putting them into
trunk, they (1) destabilize trunk, and (2) conflict with other bug fixes
and smaller improvements.

> - More unnecessary arguments/debates will happen on if some commits should
> land on trunk or a separated branch, just like what we have recently.

Again, clearly defining the requirements to be merged into trunk will make
this easier. How is this different from what we do today for branch-2? If
we still have debates, that is probably required? Not having them today is
actually a concern?

> - Because branches will get increased massively, more community efforts
> will be spent on review & vote for branches merge that means less effort
> will be spent on other commits review given our review bandwidth is quite
> short so far.

Yes and no. Strictly using feature branches will serialize features.
Integrating with other features is a one-time, albeit more involved,
process instead of multiple rebases/resolutions each somewhat involved.

> - For small feature with only 1 or 2 commits, that need three +1 from PMCs
> will increase the bar largely for contributors who just start to contribute
> on Hadoop features but no such sufficient support.

If a feature/improvement is not supported by 3 committers (not PMC
members), it is probably worth looking at why. May be, this feature should
not be included at all?

I am open to changing the requirements for a merge. What do you think of
one +1 (thorough review) and two +0s (high-level review).

If the concern is finding enough committers, I would like for the PMC to
consider voting in more committers and increasing bandwidth.

> Given these concerns, I am open to other options, like: proposed by Vinod
> or Chris, but rather than to release anything directly from trunk.

I actually thought this was Vinod's proposal. My understanding is Andrew is
resurfacing this so we finalize things.

> - This point doesn't necessarily need to be resolved now though, since
> again we're still doing alphas.
> No. I think we have to settle down this first. Without a common agreed and
> transparent release process and branches in community, any release (alpha,
> beta) bits is only called a private release but not a official apache
> hadoop release (even alpha).
I am absolutely with Junping here. Changing this process primarily requires
a change in our mental model. I think it is pretty important that we decide
on one approach preferably before doing an alpha release.

To clarify: our current approach (trunk and branch-2) has been working
okay. The only issue I see is in the way we take merging into trunk
lightly. If we have well-defined requirements for merging to trunk and take
those seriously, I am comfortable with using the approach for 3.x. The new
proposal forces following these requirements and hence I like it more.

> Thanks,
> Junping
> ________________________________________
> From: Karthik Kambatla <kasha@cloudera.com>
> Sent: Friday, June 10, 2016 7:49 AM
> To: Andrew Wang
> Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] Increased use of feature branches
> Thanks for restarting this thread Andrew. I really hope we can get this
> across to a VOTE so it is clear.
> I see a few advantages shipping from trunk:
>    - The lack of need for one additional backport each time.
>    - Feature rot in trunk
> Instead of creating branch-3, I recommend creating branch-3.x so we can
> continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
> said it :))
> On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <andrew.wang@cloudera.com>
> wrote:
> > Hi all,
> >
> > On a separate thread, a question was raised about 3.x branching and use
> of
> > feature branches going forward.
> >
> > We discussed this previously on the "Looking to a Hadoop 3 release"
> thread
> > that has spanned the years, with Vinod making this proposal (building on
> > ideas from others who also commented in the email thread):
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >
> > Pasting here for ease:
> >
> > On an unrelated note, offline I was pitching to a bunch of
> > contributors another idea to deal
> > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
> >
> > What this gains us is that
> >  - Trunk is always nearly stable or nearly ready for releases
> >  - We no longer have some code lying around in some branch (today’s
> > trunk) that is not releasable
> > because it gets mixed with other undesirable and incompatible changes.
> >  - This needs to be coupled with more discipline on individual
> > features - medium to to large
> > features are always worked upon in branches and get merged into trunk
> > (and a nearing release!)
> > when they are ready
> >  - All incompatible changes go into some sort of a trunk-incompat
> > branch and stay there till
> > we accumulate enough of those to warrant another major release.
> >
> > Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0,
> > there's no need for this branch yet. This aspect of Vinod's proposal was
> > still under a bit of discussion; Chris Douglas though we should cut a
> > branch-3 for the first 3.0.0 beta, which aligns with my original
> thinking.
> > This point doesn't necessarily need to be resolved now though, since
> again
> > we're still doing alphas.
> >
> > What we should get consensus on is the goal of keeping trunk stable, and
> > achieving that by doing more development on feature branches and being
> > judicious about merges. My sense from the Hadoop 3 email thread (and the
> > more recent one on the async API) is that people are generally in favor
> of
> > this.
> >
> > We're just about ready to do the first 3.0.0 alpha, so would greatly
> > appreciate everyone's timely response in this matter.
> >
> > Thanks,
> > Andrew
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message