hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wang <andrew.w...@cloudera.com>
Subject Re: [DISCUSS] git rebase vs. git merge for branch development
Date Tue, 18 Aug 2015 18:33:03 GMT
Sounds like we have a lot of support for also allowing merge workflows. Let
me draft a proper proposal and go through the [DISCUSS] and [VOTE] process.
One thing I think we should amend from the previous [VOTE] is using "git
merge --no-ff" rather than "rebase --onto" for branch -> trunk integration,
since it makes reverting the branch easier. Also using "git merge" rather
than a squashed commit for the branch-2 backport as Vinay said.

In the meantime, I think it's okay for ongoing feature branch development
like HDFS-7285 to start using merge rather than rebase. Haven't seen any
objections to merge yet.

On Tue, Aug 18, 2015 at 1:39 AM, Vinayakumar B <vinayakumarb@apache.org>
wrote:

> +1, I agree with the support for git-merge based workflows for large branch
> merge.
>
> I have experienced the pain of re-basing the entire branch HDFS-7285, just
> for verification though, and I found even a line change in trunk in core
> files ( ex: FSNameSystem.java, BlockManager.java ) makes it hard to rebase
> many commits in the branch.
>   One main problem, as I have experienced, with git-rebase is,
>   If we need to retain same commits, All conflicts should be resolved by
> the same person who is doing the rebase, as 'git-rebase' should be executed
>  in same machine and there is a fair chance of miss-handling conflicts and
> causing problem. The person doing rebase may not be very familiar with the
> conflicted code.
>   In these kind of situations, I think its very hard to find out what was
> the original code and what is conflicted code, once the rebase is done.
>
> IMO, its fair to go with periodic merge from trunk->branch, even though
> there are little conflicts, these may not be much problematic, compare to
> rebase-conflicts.
>
>    Regarding merging to branch-2, though it needs little more conflict
> resolutions compare to trunk, but may not be too much, as trunk and
> branch-2 are going parallel, at-least in terms of features and fixes ( ~ >
> 90% I would say).
>
> Regards,
> Vinay
>
> On Tue, Aug 18, 2015 at 6:12 AM, Sangjin Lee <sjlee@apache.org> wrote:
>
> > I also think allowing merges as a way to uprev with trunk would be a good
> > idea. AFAIK, git rebase works well when your branch is short-lived and
> > contains a fairly small number of commits, but doesn't work so well if
> your
> > branch is large. Also, the cost of rebase will only go up as time goes.
> On
> > the other hand, git merge has a pretty decent chance to succeed,
> especially
> > more so if you merge the trunk often. My 2 cents.
> >
> > Sangjin
> >
> > On Mon, Aug 17, 2015 at 1:18 PM, Jing Zhao <jing.apache@gmail.com>
> wrote:
> >
> > > I think we should allow merge-based workflows. I worked and am working
> in
> > > several big feature branches, including HDFS-2802 (>100 subtasks) and
> > > HDFS-7285 (currently already > 200 subtasks), and tried both the
> > > merge-based and rebase-based workflows. When the feature change becomes
> > > big, the rebase will become a big pain, considering a small change in
> > trunk
> > > can cause conflicts for rebasing large number of commits in the feature
> > > branch. Using "git merge" to merge trunk changes into the feature
> branch
> > is
> > > much easier in this case.
> > >
> > > Thanks,
> > > -Jing
> > >
> > > On Mon, Aug 17, 2015 at 12:17 PM, Andrew Wang <
> andrew.wang@cloudera.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I've thought about this topic more over the last week, and felt I
> > should
> > > > play devil's advocate for a merge workflow. A few comments:
> > > >
> > > >    - The issue of merges "polluting history" is mainly an issue when
> > > using
> > > >    a github PR workflow, which results in one merge per PR. Clearly
> > this
> > > is
> > > >    not okay, but a separate issue from feature branches. We only
> have a
> > > >    handful of merge commits per feature branch.
> > > >    - The issue of changes hiding in merge commits can happen when
> > > resolving
> > > >    rebase conflicts too, except it's harder to track. Right now
> neither
> > > go
> > > >    through code review, which is sketchy. We probably should review
> > these
> > > > too,
> > > >    and it's easier to review a single merge commit vs. an entire
> > rebased
> > > >    branch. Merge is also a more natural way of integrating changes
> from
> > > > trunk,
> > > >    since you just resolve all conflicts at once at the end.
> > > >    - Merge gives us a linear history on the branch but worse history
> on
> > > >    trunk/branch-2. Rebase has worse history on the branch but a
> linear
> > > > history
> > > >    on trunk/branch-2. This means for quick/small feature branches
> that
> > > > don't
> > > >    have a lot of conflicts, rebase is preferred. For large features
> > with
> > > > lots
> > > >    of conflicts, merge is preferred. This is basically what we're
> > running
> > > > into
> > > >    on HDFS-7285.
> > > >    - Rebase also comes with increased coordination costs, since
> public
> > > >    history is being rewritten. This is again okay for smaller efforts
> > > > (where
> > > >    there are fewer contributors), but more painful with bigger ones.
> > > There
> > > >    have been a number of HDFS-7285 branches created basically as a
> > result
> > > > of
> > > >    rebase, with corresponding JIRA discussions about where to commit
> > > > things.
> > > >    - The issue of a single squashed commit for the branch-2 backport
> is
> > > >    arguably an issue with how we structure our branches. If release
> > > > branches
> > > >    forked off of trunk rather than branch-2, we wouldn't have this
> > > > problem. We
> > > >    could require branch-2 integration to also happen via git merge.
> Or
> > we
> > > > kick
> > > >    trunk out to a feature branch based off of branch-2. Or we shrug
> and
> > > > keep
> > > >    the status quo.
> > > >
> > > > I'd definitely appreciate commentary from others who've worked on
> > feature
> > > > branches in git, even in communities outside of Hadoop.
> > > >
> > > > If there is support for allowing merge-based workflows in addition to
> > > > rebase, we'd need to kick off a [VOTE] thread since the last [VOTE]
> > only
> > > > allows rebase.
> > > >
> > > > Best,
> > > > Andrew
> > > >
> > > > On Mon, Aug 17, 2015 at 11:33 AM, Andrew Wang <
> > andrew.wang@cloudera.com>
> > > > wrote:
> > > >
> > > > > @Sangjin,
> > > > >
> > > > > I believe this is covered by the [VOTE] I linked to above, key
> > excerpt
> > > > > being:
> > > > >
> > > > >    3. Force-push on feature-branches is allowed. Before pulling in
> a
> > > > >    feature, the feature-branch should be rebased on latest trunk
> and
> > > the
> > > > >    changes applied to trunk through "git rebase --onto" or "git
> > > > cherry-pick
> > > > >    <commit-range>".
> > > > >
> > > > > This specifies that the last uprev final integration of the branch
> > into
> > > > trunk happen with rebase. It doesn't say anything about the periodic
> > > > uprev's, but it'd be very strange to merge periodically and then
> rebase
> > > > once at the end. So I take it to mean doing periodic uprevs with
> rebase
> > > too.
> > > > >
> > > > >
> > > > > On Mon, Aug 17, 2015 at 11:23 AM, Sangjin Lee <sjlee@apache.org>
> > > wrote:
> > > > >
> > > > >> Just to be clear, are we discussing the process of uprev'ing
the
> > > feature
> > > > >> development branch with the latest from the trunk from time to
> time,
> > > or
> > > > >> making the final merge of the feature branch onto the trunk?
> > > > >>
> > > > >> On Mon, Aug 17, 2015 at 10:21 AM, Steve Loughran <
> > > > stevel@hortonworks.com>
> > > > >> wrote:
> > > > >>
> > > > >> > I haven't done a bit piece of work in the ASF code repo
since
> the
> > > > >> > migration to git; though I have done it in the svn era.
> > > > >> >
> > > > >> >
> > > > >> > Currently with private git repos
> > > > >> > -anyone gets SCM control of their source
> > > > >> > -you can commit for your own reasons (about to make a change,
> > want a
> > > > >> > private jenkins run, ...) and gain from having many small
> > checkins.
> > > > More
> > > > >> > succinctly: if you aren't checking in your work 2+ times
a day
> > —why
> > > > not?
> > > > >> > -rebasing a painful necessity on personal, private branches
to
> > keep
> > > > the
> > > > >> > final patch to hadoop git a single diff
> > > > >> >
> > > > >> > With the private git process that's the defacto standard,
we
> lose
> > > > >> history
> > > > >> > anyway. I know what I've done and somewhere there's a tag
in my
> > own
> > > > >> github
> > > > >> > repo of my work to create a JIRA. But we don't always need
that
> > > entire
> > > > >> > history of "trying to debug kerberos", "typo in exception",
and
> > > other
> > > > >> stuff
> > > > >> > that accrues during the work.
> > > > >> >
> > > > >> > I think therefore that I'm in favour of big squash commits.
What
> > we
> > > > >> could
> > > > >> > do is extend that with a policy of
> > > > >> >
> > > > >> >
> > > > >> >   1.  tag the final commit used to make the patch, something
> like
> > > > >> > tag_HADOOP-8192. The tag ensures that the history isn't
gc'd
> > > > >> >   2.  Delete the branch (keeps the #of branches down)
> > > > >> >   3.  In the JIRA, include the name of the tag and the git
> commit
> > > > number
> > > > >> > in the comments. Someone curious can rebuild that history
> > > > >> >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message