hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wang <andrew.w...@cloudera.com>
Subject Re: [DISCUSS] git rebase vs. git merge for branch development
Date Mon, 17 Aug 2015 19:17:13 GMT
Hi all,

I've thought about this topic more over the last week, and felt I should
play devil's advocate for a merge workflow. A few comments:

   - The issue of merges "polluting history" is mainly an issue when using
   a github PR workflow, which results in one merge per PR. Clearly this is
   not okay, but a separate issue from feature branches. We only have a
   handful of merge commits per feature branch.
   - The issue of changes hiding in merge commits can happen when resolving
   rebase conflicts too, except it's harder to track. Right now neither go
   through code review, which is sketchy. We probably should review these too,
   and it's easier to review a single merge commit vs. an entire rebased
   branch. Merge is also a more natural way of integrating changes from trunk,
   since you just resolve all conflicts at once at the end.
   - Merge gives us a linear history on the branch but worse history on
   trunk/branch-2. Rebase has worse history on the branch but a linear history
   on trunk/branch-2. This means for quick/small feature branches that don't
   have a lot of conflicts, rebase is preferred. For large features with lots
   of conflicts, merge is preferred. This is basically what we're running into
   on HDFS-7285.
   - Rebase also comes with increased coordination costs, since public
   history is being rewritten. This is again okay for smaller efforts (where
   there are fewer contributors), but more painful with bigger ones. There
   have been a number of HDFS-7285 branches created basically as a result of
   rebase, with corresponding JIRA discussions about where to commit things.
   - The issue of a single squashed commit for the branch-2 backport is
   arguably an issue with how we structure our branches. If release branches
   forked off of trunk rather than branch-2, we wouldn't have this problem. We
   could require branch-2 integration to also happen via git merge. Or we kick
   trunk out to a feature branch based off of branch-2. Or we shrug and keep
   the status quo.

I'd definitely appreciate commentary from others who've worked on feature
branches in git, even in communities outside of Hadoop.

If there is support for allowing merge-based workflows in addition to
rebase, we'd need to kick off a [VOTE] thread since the last [VOTE] only
allows rebase.


On Mon, Aug 17, 2015 at 11:33 AM, Andrew Wang <andrew.wang@cloudera.com>

> @Sangjin,
> I believe this is covered by the [VOTE] I linked to above, key excerpt
> being:
>    3. Force-push on feature-branches is allowed. Before pulling in a
>    feature, the feature-branch should be rebased on latest trunk and the
>    changes applied to trunk through "git rebase --onto" or "git cherry-pick
>    <commit-range>".
> This specifies that the last uprev final integration of the branch into trunk happen
with rebase. It doesn't say anything about the periodic uprev's, but it'd be very strange
to merge periodically and then rebase once at the end. So I take it to mean doing periodic
uprevs with rebase too.
> On Mon, Aug 17, 2015 at 11:23 AM, Sangjin Lee <sjlee@apache.org> wrote:
>> Just to be clear, are we discussing the process of uprev'ing the feature
>> development branch with the latest from the trunk from time to time, or
>> making the final merge of the feature branch onto the trunk?
>> On Mon, Aug 17, 2015 at 10:21 AM, Steve Loughran <stevel@hortonworks.com>
>> wrote:
>> > I haven't done a bit piece of work in the ASF code repo since the
>> > migration to git; though I have done it in the svn era.
>> >
>> >
>> > Currently with private git repos
>> > -anyone gets SCM control of their source
>> > -you can commit for your own reasons (about to make a change, want a
>> > private jenkins run, ...) and gain from having many small checkins. More
>> > succinctly: if you aren't checking in your work 2+ times a day —why not?
>> > -rebasing a painful necessity on personal, private branches to keep the
>> > final patch to hadoop git a single diff
>> >
>> > With the private git process that's the defacto standard, we lose
>> history
>> > anyway. I know what I've done and somewhere there's a tag in my own
>> github
>> > repo of my work to create a JIRA. But we don't always need that entire
>> > history of "trying to debug kerberos", "typo in exception", and other
>> stuff
>> > that accrues during the work.
>> >
>> > I think therefore that I'm in favour of big squash commits. What we
>> could
>> > do is extend that with a policy of
>> >
>> >
>> >   1.  tag the final commit used to make the patch, something like
>> > tag_HADOOP-8192. The tag ensures that the history isn't gc'd
>> >   2.  Delete the branch (keeps the #of branches down)
>> >   3.  In the JIRA, include the name of the tag and the git commit number
>> > in the comments. Someone curious can rebuild that history
>> >
>> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message