hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sangjin Lee <sjl...@gmail.com>
Subject Re: [DISCUSS] git rebase vs. git merge for branch development
Date Tue, 18 Aug 2015 21:06:56 GMT
One other (long shot) option might be to do git cherry-picks of all new
*trunk* commits into the feature branch when you uprev. But I'm not sure if
that will be a sustainable practice, given the number of commits that are
happening on the trunk. Unless you're upreving very often (e.g. daily),
this could also get out of hand.

On Tue, Aug 18, 2015 at 11:33 AM, Andrew Wang <andrew.wang@cloudera.com>
wrote:

> Sounds like we have a lot of support for also allowing merge workflows. Let
> me draft a proper proposal and go through the [DISCUSS] and [VOTE] process.
> One thing I think we should amend from the previous [VOTE] is using "git
> merge --no-ff" rather than "rebase --onto" for branch -> trunk integration,
> since it makes reverting the branch easier. Also using "git merge" rather
> than a squashed commit for the branch-2 backport as Vinay said.
>
> In the meantime, I think it's okay for ongoing feature branch development
> like HDFS-7285 to start using merge rather than rebase. Haven't seen any
> objections to merge yet.
>
> On Tue, Aug 18, 2015 at 1:39 AM, Vinayakumar B <vinayakumarb@apache.org>
> wrote:
>
> > +1, I agree with the support for git-merge based workflows for large
> branch
> > merge.
> >
> > I have experienced the pain of re-basing the entire branch HDFS-7285,
> just
> > for verification though, and I found even a line change in trunk in core
> > files ( ex: FSNameSystem.java, BlockManager.java ) makes it hard to
> rebase
> > many commits in the branch.
> >   One main problem, as I have experienced, with git-rebase is,
> >   If we need to retain same commits, All conflicts should be resolved by
> > the same person who is doing the rebase, as 'git-rebase' should be
> executed
> >  in same machine and there is a fair chance of miss-handling conflicts
> and
> > causing problem. The person doing rebase may not be very familiar with
> the
> > conflicted code.
> >   In these kind of situations, I think its very hard to find out what was
> > the original code and what is conflicted code, once the rebase is done.
> >
> > IMO, its fair to go with periodic merge from trunk->branch, even though
> > there are little conflicts, these may not be much problematic, compare to
> > rebase-conflicts.
> >
> >    Regarding merging to branch-2, though it needs little more conflict
> > resolutions compare to trunk, but may not be too much, as trunk and
> > branch-2 are going parallel, at-least in terms of features and fixes ( ~
> >
> > 90% I would say).
> >
> > Regards,
> > Vinay
> >
> > On Tue, Aug 18, 2015 at 6:12 AM, Sangjin Lee <sjlee@apache.org> wrote:
> >
> > > I also think allowing merges as a way to uprev with trunk would be a
> good
> > > idea. AFAIK, git rebase works well when your branch is short-lived and
> > > contains a fairly small number of commits, but doesn't work so well if
> > your
> > > branch is large. Also, the cost of rebase will only go up as time goes.
> > On
> > > the other hand, git merge has a pretty decent chance to succeed,
> > especially
> > > more so if you merge the trunk often. My 2 cents.
> > >
> > > Sangjin
> > >
> > > On Mon, Aug 17, 2015 at 1:18 PM, Jing Zhao <jing.apache@gmail.com>
> > wrote:
> > >
> > > > I think we should allow merge-based workflows. I worked and am
> working
> > in
> > > > several big feature branches, including HDFS-2802 (>100 subtasks) and
> > > > HDFS-7285 (currently already > 200 subtasks), and tried both the
> > > > merge-based and rebase-based workflows. When the feature change
> becomes
> > > > big, the rebase will become a big pain, considering a small change in
> > > trunk
> > > > can cause conflicts for rebasing large number of commits in the
> feature
> > > > branch. Using "git merge" to merge trunk changes into the feature
> > branch
> > > is
> > > > much easier in this case.
> > > >
> > > > Thanks,
> > > > -Jing
> > > >
> > > > On Mon, Aug 17, 2015 at 12:17 PM, Andrew Wang <
> > andrew.wang@cloudera.com>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I've thought about this topic more over the last week, and felt I
> > > should
> > > > > play devil's advocate for a merge workflow. A few comments:
> > > > >
> > > > >    - The issue of merges "polluting history" is mainly an issue
> when
> > > > using
> > > > >    a github PR workflow, which results in one merge per PR. Clearly
> > > this
> > > > is
> > > > >    not okay, but a separate issue from feature branches. We only
> > have a
> > > > >    handful of merge commits per feature branch.
> > > > >    - The issue of changes hiding in merge commits can happen when
> > > > resolving
> > > > >    rebase conflicts too, except it's harder to track. Right now
> > neither
> > > > go
> > > > >    through code review, which is sketchy. We probably should review
> > > these
> > > > > too,
> > > > >    and it's easier to review a single merge commit vs. an entire
> > > rebased
> > > > >    branch. Merge is also a more natural way of integrating changes
> > from
> > > > > trunk,
> > > > >    since you just resolve all conflicts at once at the end.
> > > > >    - Merge gives us a linear history on the branch but worse
> history
> > on
> > > > >    trunk/branch-2. Rebase has worse history on the branch but a
> > linear
> > > > > history
> > > > >    on trunk/branch-2. This means for quick/small feature branches
> > that
> > > > > don't
> > > > >    have a lot of conflicts, rebase is preferred. For large features
> > > with
> > > > > lots
> > > > >    of conflicts, merge is preferred. This is basically what we're
> > > running
> > > > > into
> > > > >    on HDFS-7285.
> > > > >    - Rebase also comes with increased coordination costs, since
> > public
> > > > >    history is being rewritten. This is again okay for smaller
> efforts
> > > > > (where
> > > > >    there are fewer contributors), but more painful with bigger
> ones.
> > > > There
> > > > >    have been a number of HDFS-7285 branches created basically as
a
> > > result
> > > > > of
> > > > >    rebase, with corresponding JIRA discussions about where to
> commit
> > > > > things.
> > > > >    - The issue of a single squashed commit for the branch-2
> backport
> > is
> > > > >    arguably an issue with how we structure our branches. If release
> > > > > branches
> > > > >    forked off of trunk rather than branch-2, we wouldn't have this
> > > > > problem. We
> > > > >    could require branch-2 integration to also happen via git merge.
> > Or
> > > we
> > > > > kick
> > > > >    trunk out to a feature branch based off of branch-2. Or we shrug
> > and
> > > > > keep
> > > > >    the status quo.
> > > > >
> > > > > I'd definitely appreciate commentary from others who've worked on
> > > feature
> > > > > branches in git, even in communities outside of Hadoop.
> > > > >
> > > > > If there is support for allowing merge-based workflows in addition
> to
> > > > > rebase, we'd need to kick off a [VOTE] thread since the last [VOTE]
> > > only
> > > > > allows rebase.
> > > > >
> > > > > Best,
> > > > > Andrew
> > > > >
> > > > > On Mon, Aug 17, 2015 at 11:33 AM, Andrew Wang <
> > > andrew.wang@cloudera.com>
> > > > > wrote:
> > > > >
> > > > > > @Sangjin,
> > > > > >
> > > > > > I believe this is covered by the [VOTE] I linked to above, key
> > > excerpt
> > > > > > being:
> > > > > >
> > > > > >    3. Force-push on feature-branches is allowed. Before pulling
> in
> > a
> > > > > >    feature, the feature-branch should be rebased on latest trunk
> > and
> > > > the
> > > > > >    changes applied to trunk through "git rebase --onto" or "git
> > > > > cherry-pick
> > > > > >    <commit-range>".
> > > > > >
> > > > > > This specifies that the last uprev final integration of the
> branch
> > > into
> > > > > trunk happen with rebase. It doesn't say anything about the
> periodic
> > > > > uprev's, but it'd be very strange to merge periodically and then
> > rebase
> > > > > once at the end. So I take it to mean doing periodic uprevs with
> > rebase
> > > > too.
> > > > > >
> > > > > >
> > > > > > On Mon, Aug 17, 2015 at 11:23 AM, Sangjin Lee <sjlee@apache.org>
> > > > wrote:
> > > > > >
> > > > > >> Just to be clear, are we discussing the process of uprev'ing
the
> > > > feature
> > > > > >> development branch with the latest from the trunk from time
to
> > time,
> > > > or
> > > > > >> making the final merge of the feature branch onto the trunk?
> > > > > >>
> > > > > >> On Mon, Aug 17, 2015 at 10:21 AM, Steve Loughran <
> > > > > stevel@hortonworks.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > I haven't done a bit piece of work in the ASF code
repo since
> > the
> > > > > >> > migration to git; though I have done it in the svn
era.
> > > > > >> >
> > > > > >> >
> > > > > >> > Currently with private git repos
> > > > > >> > -anyone gets SCM control of their source
> > > > > >> > -you can commit for your own reasons (about to make
a change,
> > > want a
> > > > > >> > private jenkins run, ...) and gain from having many
small
> > > checkins.
> > > > > More
> > > > > >> > succinctly: if you aren't checking in your work 2+
times a day
> > > —why
> > > > > not?
> > > > > >> > -rebasing a painful necessity on personal, private
branches to
> > > keep
> > > > > the
> > > > > >> > final patch to hadoop git a single diff
> > > > > >> >
> > > > > >> > With the private git process that's the defacto standard,
we
> > lose
> > > > > >> history
> > > > > >> > anyway. I know what I've done and somewhere there's
a tag in
> my
> > > own
> > > > > >> github
> > > > > >> > repo of my work to create a JIRA. But we don't always
need
> that
> > > > entire
> > > > > >> > history of "trying to debug kerberos", "typo in exception",
> and
> > > > other
> > > > > >> stuff
> > > > > >> > that accrues during the work.
> > > > > >> >
> > > > > >> > I think therefore that I'm in favour of big squash
commits.
> What
> > > we
> > > > > >> could
> > > > > >> > do is extend that with a policy of
> > > > > >> >
> > > > > >> >
> > > > > >> >   1.  tag the final commit used to make the patch,
something
> > like
> > > > > >> > tag_HADOOP-8192. The tag ensures that the history isn't
gc'd
> > > > > >> >   2.  Delete the branch (keeps the #of branches down)
> > > > > >> >   3.  In the JIRA, include the name of the tag and
the git
> > commit
> > > > > number
> > > > > >> > in the comments. Someone curious can rebuild that history
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message