hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wang <andrew.w...@cloudera.com>
Subject [DISCUSS] git rebase vs. git merge for branch development
Date Tue, 11 Aug 2015 21:19:57 GMT
Hi all,

We are currently working on a pretty substantial new feature in a branch
over at HDFS-7285. As the # of commits has grown, running `git rebase` and
fixing conflicts in the 180+ commits has become untenable. As you may
recall, we voted to use a rebase workflow when we did the switch from SVN
to git a year ago [1].

I'm aware of two proposals right now:

========

Proposal 1: Squash some of the commits to make rebase easier.

Often times, intermediate commits are made to code that get changed again
later, and thus don't end up in HEAD. Fixing conflicts in these
intermediate commits is a waste of time, especially with 180 commits. I run
into this issue even with my local feature branches, and thus squash.

The downside is that squashing loses some of the development history, since
now multiple JIRAs are combined into a single commit. There are some ways
to mitigate this: the old branch with the full history can be left in
place, and the squashed commits can reference the JIRAs that have been
squashed together.

========

Proposal 2: Allow merge-based workflows too.

This is what we were doing in the SVN days. Periodically merge trunk to the
branch, resulting in merge commits to resolve conflicts. When the branch is
ready, merge it back to trunk.

I read through the discussion thread [2] where we decided to go with
rebase, The concerns were that merge commits pollute history, which was an
issue for HBase and I believe Spark. Merge commits are not associated with
a single JIRA or commit, and fixes are sometimes hidden in merge commits.
This makes backports harder.

Merge-based workflows also squash the history when backporting to a branch.
In the SVN merge-based days, backporting to branch-2 was typically done as
a single squashed commit. With a rebase workflow, it's possible to rebase
the branch against branch-2 and get the same history as trunk.

========

My mild preference is for Proposal #1 since it results in a clean linear
history in both trunk and branch-2, but it has to be understood that
squashing is sometimes a required part of a rebase workflow. If the core
issue with squashing is maintaining development history, I think it's
satisfied by keeping old branches around and referencing the squashed JIRAs.

Welcome other thoughts here too.

Best,
Andrew

[1]:
http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201408.mbox/%3CCALwhT94Y64M9keY25Ry_QOLUSZQT29tJQ95twsoa8xXrcNTxpQ%40mail.gmail.com%3E

[2]:
http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201408.mbox/%3CCALwhT97bM36X6-3%3DcCUwaAKxZ80jfZwuf53BTR7TbWwV5e%2BXkA%40mail.gmail.com%3E

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message