Return-Path: X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 494EC18C0B for ; Mon, 17 Aug 2015 20:18:49 +0000 (UTC) Received: (qmail 5139 invoked by uid 500); 17 Aug 2015 20:18:47 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 5061 invoked by uid 500); 17 Aug 2015 20:18:47 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 5041 invoked by uid 99); 17 Aug 2015 20:18:47 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Aug 2015 20:18:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 01172C4F61 for ; Mon, 17 Aug 2015 20:18:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id abcRFcfr09Gu for ; Mon, 17 Aug 2015 20:18:33 +0000 (UTC) Received: from mail-qg0-f41.google.com (mail-qg0-f41.google.com [209.85.192.41]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 0F786428E7 for ; Mon, 17 Aug 2015 20:18:33 +0000 (UTC) Received: by qged69 with SMTP id d69so102068228qge.0 for ; Mon, 17 Aug 2015 13:18:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=BcnT62EgYhXlH6zoFCy54JCQNO7IxqKlW8CyAYqOCvA=; b=wwlF/hdrv8t5m7DCHWDs7OILJNckFHxucbW+5zPNOcwQdQ2JhIq9sqjjudjs4ikY1N ZYFQam5TIdXRBMtxDkzHeenW0tAOU8y6dURs4Z9rncw1DzacAsQfGCH71TUb8jWlT36z c9reOrBQ4pBW8mV5qHauvdqr9V+ckoPtsukvW1mvlqEzS/mALsAaNzi3di1qrWSES8qV DtwtWrlqAU4TtE4UrLNAH8VfSr6J1+MjdlIwZq/bpUfdEV7+w/Zj++BT5yoScvCfHLFM r0gt/MTc86+/1L5nVq4pjRhpTzZdDZDFZM3LrwKJwpQlYxNjN0Xn84Pm1roRSMT+OSyY TfOQ== X-Received: by 10.140.22.138 with SMTP id 10mr5836598qgn.42.1439842712734; Mon, 17 Aug 2015 13:18:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.41.106 with HTTP; Mon, 17 Aug 2015 13:18:13 -0700 (PDT) In-Reply-To: References: From: Jing Zhao Date: Mon, 17 Aug 2015 13:18:13 -0700 Message-ID: Subject: Re: [DISCUSS] git rebase vs. git merge for branch development To: common-dev@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c156526cd16e051d8781a9 --001a11c156526cd16e051d8781a9 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I think we should allow merge-based workflows. I worked and am working in several big feature branches, including HDFS-2802 (>100 subtasks) and HDFS-7285 (currently already > 200 subtasks), and tried both the merge-based and rebase-based workflows. When the feature change becomes big, the rebase will become a big pain, considering a small change in trunk can cause conflicts for rebasing large number of commits in the feature branch. Using "git merge" to merge trunk changes into the feature branch is much easier in this case. Thanks, -Jing On Mon, Aug 17, 2015 at 12:17 PM, Andrew Wang wrote: > Hi all, > > I've thought about this topic more over the last week, and felt I should > play devil's advocate for a merge workflow. A few comments: > > - The issue of merges "polluting history" is mainly an issue when usin= g > a github PR workflow, which results in one merge per PR. Clearly this = is > not okay, but a separate issue from feature branches. We only have a > handful of merge commits per feature branch. > - The issue of changes hiding in merge commits can happen when resolvi= ng > rebase conflicts too, except it's harder to track. Right now neither g= o > through code review, which is sketchy. We probably should review these > too, > and it's easier to review a single merge commit vs. an entire rebased > branch. Merge is also a more natural way of integrating changes from > trunk, > since you just resolve all conflicts at once at the end. > - Merge gives us a linear history on the branch but worse history on > trunk/branch-2. Rebase has worse history on the branch but a linear > history > on trunk/branch-2. This means for quick/small feature branches that > don't > have a lot of conflicts, rebase is preferred. For large features with > lots > of conflicts, merge is preferred. This is basically what we're running > into > on HDFS-7285. > - Rebase also comes with increased coordination costs, since public > history is being rewritten. This is again okay for smaller efforts > (where > there are fewer contributors), but more painful with bigger ones. Ther= e > have been a number of HDFS-7285 branches created basically as a result > of > rebase, with corresponding JIRA discussions about where to commit > things. > - The issue of a single squashed commit for the branch-2 backport is > arguably an issue with how we structure our branches. If release > branches > forked off of trunk rather than branch-2, we wouldn't have this > problem. We > could require branch-2 integration to also happen via git merge. Or we > kick > trunk out to a feature branch based off of branch-2. Or we shrug and > keep > the status quo. > > I'd definitely appreciate commentary from others who've worked on feature > branches in git, even in communities outside of Hadoop. > > If there is support for allowing merge-based workflows in addition to > rebase, we'd need to kick off a [VOTE] thread since the last [VOTE] only > allows rebase. > > Best, > Andrew > > On Mon, Aug 17, 2015 at 11:33 AM, Andrew Wang > wrote: > > > @Sangjin, > > > > I believe this is covered by the [VOTE] I linked to above, key excerpt > > being: > > > > 3. Force-push on feature-branches is allowed. Before pulling in a > > feature, the feature-branch should be rebased on latest trunk and th= e > > changes applied to trunk through "git rebase --onto" or "git > cherry-pick > > ". > > > > This specifies that the last uprev final integration of the branch into > trunk happen with rebase. It doesn't say anything about the periodic > uprev's, but it'd be very strange to merge periodically and then rebase > once at the end. So I take it to mean doing periodic uprevs with rebase t= oo. > > > > > > On Mon, Aug 17, 2015 at 11:23 AM, Sangjin Lee wrote: > > > >> Just to be clear, are we discussing the process of uprev'ing the featu= re > >> development branch with the latest from the trunk from time to time, o= r > >> making the final merge of the feature branch onto the trunk? > >> > >> On Mon, Aug 17, 2015 at 10:21 AM, Steve Loughran < > stevel@hortonworks.com> > >> wrote: > >> > >> > I haven't done a bit piece of work in the ASF code repo since the > >> > migration to git; though I have done it in the svn era. > >> > > >> > > >> > Currently with private git repos > >> > -anyone gets SCM control of their source > >> > -you can commit for your own reasons (about to make a change, want a > >> > private jenkins run, ...) and gain from having many small checkins. > More > >> > succinctly: if you aren't checking in your work 2+ times a day =E2= =80=94why > not? > >> > -rebasing a painful necessity on personal, private branches to keep > the > >> > final patch to hadoop git a single diff > >> > > >> > With the private git process that's the defacto standard, we lose > >> history > >> > anyway. I know what I've done and somewhere there's a tag in my own > >> github > >> > repo of my work to create a JIRA. But we don't always need that enti= re > >> > history of "trying to debug kerberos", "typo in exception", and othe= r > >> stuff > >> > that accrues during the work. > >> > > >> > I think therefore that I'm in favour of big squash commits. What we > >> could > >> > do is extend that with a policy of > >> > > >> > > >> > 1. tag the final commit used to make the patch, something like > >> > tag_HADOOP-8192. The tag ensures that the history isn't gc'd > >> > 2. Delete the branch (keeps the #of branches down) > >> > 3. In the JIRA, include the name of the tag and the git commit > number > >> > in the comments. Someone curious can rebuild that history > >> > > >> > > >> > > > > > --001a11c156526cd16e051d8781a9--