hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sangjin Lee <sj...@apache.org>
Subject Re: [DISCUSS] Increased use of feature branches
Date Fri, 10 Jun 2016 21:10:05 GMT
Thanks for your thoughts Anu.

Regarding your question

> And then comes the question, once 3.0 becomes official, where do we
> check-in a change,  if that would break something? so this will lead us
> back to trunk being the unstable – 3.0 being the new “branch-2”.


Andrew mentioned in the original email

> Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal
> was still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.


and I agree with that sentiment. I think even if we have a "trunk-incompat"
branch to hold future incompatible changes, the situation will change
little from today. Instead of dealing with "trunk" (where incompatible
changes may appear) and "branch-3", we would be dealing with
"trunk-incompat" and "trunk". Names are largely mnemonics then.


On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <aengineer@hortonworks.com>
wrote:

> I actively work on two branches (Diskbalancer and ozone) and I agree with
> most of what Sangjin said.
> There is an overhead in working with branches, there are both technical
> costs and administrative issues
> which discourages developers from using branches.
>
> I think the biggest issue with branch based development is that fact that
> other developers do not use a branch.
> If a small feature appears as a series of commits to “”datanode.java””,
> the branch based developer ends up rebasing
> and paying this price of rebasing many times. If everyone followed a model
> of branch + Pull request, other branches
> would not have to deal with continues rebasing to trunk commits. If we are
> moving to a branch based
> development, we should probably move to that model for most development to
> avoid this tax on people who
>  actually end up working in the branches.
>
> I do have a question in my mind though: What is being proposed is that we
> move active development to branches
> if the feature is small or incomplete, however keep the trunk open for
> check-ins. One of the biggest reason why we
> check-in into trunk and not to branch-2 is because it is a change that
> will break backward compatibility. So do we
> have an expectation of backward compatibility thru the 3.0-alpha series (I
> personally vote No, since 3.0 is experimental
> at this stage), but if we decide to support some sort of backward-compact
> then willy-nilly committing to trunk
> and still maintaining the expectation we can release Alphas from 3.0 does
> not look possible.
>
> And then comes the question, once 3.0 becomes official, where do we
> check-in a change,  if that would break something?
> so this will lead us back to trunk being the unstable – 3.0 being the new
> “branch-2”.
>
> One more point: If we are moving to use a branch always – then we are
> looking at a model similar to using a git + pull
> request model. If that is so would it make sense to modify the rules to
> make these branches easier to merge?
> Say for example, if all commits in a branch has followed review and
> checking policy – just like trunk and commits
> have been made only after a sign off from a committer, would it be
> possible to merge with a 3-day voting period
> instead of 7, or treat it just like today’s commit to trunk – but with 2
> people signing-off?
>
> What I am suggesting is reducing the administrative overheads of using a
> branch to encourage use of branching.
> Right now it feels like Apache’s process encourages committing directly to
> trunk than a branch
>
> Thanks
> Anu
>
>
> On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <
> sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:
>
> >Having worked on a major feature in a feature branch, I have some thoughts
> >and observations on feature branch development.
> >
> >IMO feature branch development v. direct commits to trunk in piecemeal is
> >really a choice of *granularity*. Do we want a series of fine-grained
> state
> >changes on trunk or fewer coarse-grained chunks of commits on trunk?
> >
> >This makes me favor a branch-based development model for any
> "decent-sized"
> >features (we'll need to define "decent-sized" of course). Once you have
> >coarse-grained changes, it's easier to reason about what made what release
> >and in what state. As importantly, it makes it easier to back out a
> >complete feature fairly easily if that becomes necessary. My totally
> >unscientific suggestion may be if a feature takes more than dozen commits
> >and longer than a month, we should probably have a bias towards a feature
> >branch.
> >
> >Branch-based development also makes you go faster if your feature is
> >larger. I wouldn't do it the other way for timeline service v.2 for
> example.
> >
> >That said, feature branches don't come for free. Now the onus is on the
> >feature developer to constantly rebase with the trunk to keep it
> reasonably
> >integrated with the trunk. More logistics is involved for the feature
> >developer. Another big question is, when a feature branch gets big and
> it's
> >time to merge, would it get as scrutinized as a series of individual
> >commits? Since the size of merge can be big, you kind of have to rely on
> >those feature committers and those who help them.
> >
> >In terms of integrating/stabilizing, I don't think branch development
> >necessarily makes it harder. It is again granularity. In case of direct
> >commits on trunk, you do a lot more fine-grained integrations. In case of
> >branch development, you do far fewer coarse-grained integrations via
> >rebasing. If more people are doing branch-based development, it makes
> >rebasing easier to manage too.
> >
> >Going back to the related topic of where to release (trunk v. branch-X), I
> >think that is more of a proxy of the real question of "how do we maintain
> >quality and stability of the trunk?". Even if we release from the trunk,
> if
> >our bar for merging to trunk is low, the quality will not improve
> >automatically. So I think we ought to tackle the quality question first.
> >
> >My 2 cents.
> >
> >
> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zhz@apache.org> wrote:
> >
> >> Thanks for the notes Andrew, Junping, Karthik.
> >>
> >> Here are some of my understandings:
> >>
> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
> >> Hadoop today, without legacy workloads, trunk is what he/she should use.
> >> - Therefore, each commit to trunk should be transactional -- atomic,
> >> consistent, isolated (from other uncommitted patches); I'm not so sure
> >> about durability, Hadoop might be gone in 50 years :). As a committer, I
> >> should be able to look at a patch and determine whether it's a
> >> self-contained improvement of trunk, without looking at other
> uncommitted
> >> patches.
> >> - Some comments inline:
> >>
> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jdu@hortonworks.com> wrote:
> >>
> >> > Comparing with advantages, I believe the disadvantages of shipping any
> >> > releases directly from trunk are more obvious and significant:
> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
> have
> >> > to wait to commit to trunk or put into a separated branch that could
> >> delay
> >> > feature development progress as additional vote process get involved
> even
> >> > the feature is simple and harmless.
> >> >
> >> Thanks Junping, those are valid concerns. I think we should clearly
> >> separate incompatible with  uncompleted / half-done work in this
> >> discussion. Whether people should commit incompatible changes to trunk
> is a
> >> much more tricky question (related to trunk-incompat etc.). But per my
> >> comment above, IMHO, *not committing uncompleted work to trunk* should
> be a
> >> much easier principle to agree upon.
> >>
> >>
> >> > - For small feature with only 1 or 2 commits, that need three +1 from
> >> PMCs
> >> > will increase the bar largely for contributors who just start to
> >> contribute
> >> > on Hadoop features but no such sufficient support.
> >> >
> >> Development overhead is another valid concern. I think our rule-of-thumb
> >> should be that, small-medium new features should be proposed as a single
> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
> >> beyond a single JIRA/patch, use a feature branch.
> >>
> >>
> >> >
> >> > Given these concerns, I am open to other options, like: proposed by
> Vinod
> >> > or Chris, but rather than to release anything directly from trunk.
> >> >
> >> > - This point doesn't necessarily need to be resolved now though, since
> >> > again we're still doing alphas.
> >> > No. I think we have to settle down this first. Without a common agreed
> >> and
> >> > transparent release process and branches in community, any release
> >> (alpha,
> >> > beta) bits is only called a private release but not a official apache
> >> > hadoop release (even alpha).
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Junping
> >> > ________________________________________
> >> > From: Karthik Kambatla <kasha@cloudera.com>
> >> > Sent: Friday, June 10, 2016 7:49 AM
> >> > To: Andrew Wang
> >> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> >> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> >> > Subject: Re: [DISCUSS] Increased use of feature branches
> >> >
> >> > Thanks for restarting this thread Andrew. I really hope we can get
> this
> >> > across to a VOTE so it is clear.
> >> >
> >> > I see a few advantages shipping from trunk:
> >> >
> >> >    - The lack of need for one additional backport each time.
> >> >    - Feature rot in trunk
> >> >
> >> > Instead of creating branch-3, I recommend creating branch-3.x so we
> can
> >> > continue doing 3.x releases off branch-3 even after we move trunk to
> 4.x
> >> (I
> >> > said it :))
> >> >
> >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <
> andrew.wang@cloudera.com>
> >> > wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > On a separate thread, a question was raised about 3.x branching and
> use
> >> > of
> >> > > feature branches going forward.
> >> > >
> >> > > We discussed this previously on the "Looking to a Hadoop 3 release"
> >> > thread
> >> > > that has spanned the years, with Vinod making this proposal
> (building
> >> on
> >> > > ideas from others who also commented in the email thread):
> >> > >
> >> > >
> >> > >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >> > >
> >> > > Pasting here for ease:
> >> > >
> >> > > On an unrelated note, offline I was pitching to a bunch of
> >> > > contributors another idea to deal
> >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk
> directly*.
> >> > >
> >> > > What this gains us is that
> >> > >  - Trunk is always nearly stable or nearly ready for releases
> >> > >  - We no longer have some code lying around in some branch (today’s
> >> > > trunk) that is not releasable
> >> > > because it gets mixed with other undesirable and incompatible
> changes.
> >> > >  - This needs to be coupled with more discipline on individual
> >> > > features - medium to to large
> >> > > features are always worked upon in branches and get merged into
> trunk
> >> > > (and a nearing release!)
> >> > > when they are ready
> >> > >  - All incompatible changes go into some sort of a trunk-incompat
> >> > > branch and stay there till
> >> > > we accumulate enough of those to warrant another major release.
> >> > >
> >> > > Regarding "trunk-incompat", since we're still in the alpha stage for
> >> > 3.0.0,
> >> > > there's no need for this branch yet. This aspect of Vinod's proposal
> >> was
> >> > > still under a bit of discussion; Chris Douglas though we should cut
> a
> >> > > branch-3 for the first 3.0.0 beta, which aligns with my original
> >> > thinking.
> >> > > This point doesn't necessarily need to be resolved now though, since
> >> > again
> >> > > we're still doing alphas.
> >> > >
> >> > > What we should get consensus on is the goal of keeping trunk stable,
> >> and
> >> > > achieving that by doing more development on feature branches and
> being
> >> > > judicious about merges. My sense from the Hadoop 3 email thread (and
> >> the
> >> > > more recent one on the async API) is that people are generally in
> favor
> >> > of
> >> > > this.
> >> > >
> >> > > We're just about ready to do the first 3.0.0 alpha, so would greatly
> >> > > appreciate everyone's timely response in this matter.
> >> > >
> >> > > Thanks,
> >> > > Andrew
> >> > >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> >> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >> >
> >> >
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message