Return-Path: X-Original-To: apmail-infrastructure-dev-archive@minotaur.apache.org Delivered-To: apmail-infrastructure-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 13237187D7 for ; Wed, 4 Nov 2015 13:07:40 +0000 (UTC) Received: (qmail 85593 invoked by uid 500); 4 Nov 2015 13:07:40 -0000 Delivered-To: apmail-infrastructure-dev-archive@apache.org Received: (qmail 85437 invoked by uid 500); 4 Nov 2015 13:07:39 -0000 Mailing-List: contact infrastructure-dev-help@apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: infrastructure-dev@apache.org Delivered-To: mailing list infrastructure-dev@apache.org Received: (qmail 85426 invoked by uid 99); 4 Nov 2015 13:07:39 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Nov 2015 13:07:39 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 12A7F1A214E for ; Wed, 4 Nov 2015 13:07:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.001 X-Spam-Level: **** X-Spam-Status: No, score=4.001 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_LAZY_DOMAIN_SECURITY=1, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id RLA8WVSW0D10 for ; Wed, 4 Nov 2015 13:07:26 +0000 (UTC) Received: from jeremythomerson.com (mail.jeremythomerson.com [74.117.189.150]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTP id 46EF542963 for ; Wed, 4 Nov 2015 13:07:26 +0000 (UTC) Received: from mail-io0-f178.google.com (mail-io0-f178.google.com [209.85.223.178]) by jeremythomerson.com (Postfix) with ESMTP id 206901CE67 for ; Wed, 4 Nov 2015 14:49:37 -0600 (CST) Received: by iody8 with SMTP id y8so52419692iod.1 for ; Wed, 04 Nov 2015 05:07:17 -0800 (PST) X-Received: by 10.107.157.71 with SMTP id g68mr2595946ioe.118.1446642437840; Wed, 04 Nov 2015 05:07:17 -0800 (PST) MIME-Version: 1.0 Reply-To: jeremy@thomersonfamily.com Received: by 10.50.182.130 with HTTP; Wed, 4 Nov 2015 05:06:58 -0800 (PST) In-Reply-To: References: From: Jeremy Thomerson Date: Wed, 4 Nov 2015 08:06:58 -0500 Message-ID: Subject: Re: Git, history, protection, and other topics To: infrastructure-dev@apache.org Content-Type: multipart/alternative; boundary=001a1140c538a00c860523b6b014 --001a1140c538a00c860523b6b014 Content-Type: text/plain; charset=UTF-8 Thank you for the excellent and accurate write-up Sam. I'm happy to see the factualness and accuracy of your email because it helps avoid the "rewriting is from the devil" mentality that sometimes pops up, primarily among those who aren't day-to-day git power users. If an analogy helps: with git you never really "rewrite history"... as Sam points out. Instead, you just change which version of history is called by a certain name. In other words, if the tip of my master branch was commit deadbeef, and I "rewrite history" by rebasing, fixing up a commit, changing the commit message, etc, and now force push a new commit abcdef1, both commits - with absolutely all of their history - still exist (and will until your GC settings prune them). It's just that the reference that I call "master" now points to a different version of history (abcdef1) than it did previously (deadbeef). It's like having two history books on the shelf and swapping the covers on them - both versions of history are still there, just called by different names. In most places I work, we allow this sort of "rewriting" on work-in-progress branches but always keep stable, no rewrite, only fast forward branches around as our primary "truth of source" branch(es). That model works fine for us. I think it could work for the ASF to, and allow for proper code provenance, as long as projects are using a locked branch as the branch they cut releases from. In any event, until the board with the help of legal advice and in a way that represents the members sets the definite policy on what's *required* for code provenance, it's nearly impossible to have any technical discussions about how to implement what's required. It's like writing software with no requirements - which we all know is a bad idea. Thanks! Jeremy Thomerson On Wed, Nov 4, 2015 at 7:43 AM, Sam Ruby wrote: > On Tue, Nov 3, 2015 at 11:08 PM, David Nalley wrote: > > Hi folks, > > > > So earlier today I sent an email to PMCs@ indicating that we had > > turned on disabled fast forward commits and branch/tag deletion across > > all of the ASF git repositories. [1] > > > > The crux of the problem is that infrastructure had set the expectation > > that certain branches and tags were protected from force pushes or > > branch/tag deletion. > > > > It was recently discovered that a large number of our projects were > > doing their main branch of development outside of these protected > > branches, and not using the release branch and tag scheme that would > > leave them protected. Some, were using branches with names like > > 'develop' while others had $project_foo. > > > > As a short-term, interim step to allow us to meet the expectation that > > the main we blocked fast-forward pushes and branch/tag deletion until > > we can figure out the best way to adequately address the situation. > > > > I don't know whether or not the situation is best addressed via policy > > or technical means, but the discussion here is designed to discover > > what that should look like, so that we can move past the admittedly > > blunt, and likely disruptive measure that we introduced today. > > > > So; let the discussions begin. > > It would be helpful to start with some goals and/or rationale behind > the current policy. > > I'll start with the assumption that "rewrite history" sounds scary. > I'm going to make the case the term "rewrite history" isn't accurate. > > To start with, a git repository is a set of objects. We will focus on > commit objects. Commits are identified by a hash of both content and > metadata. Change anything, and you have a new object with a new hash. > Push a new commit and you have both the old object and new object in > the repository. > > Commit objects are organized in a directed acyclic graph. Commit are > located by references (such as tags). By default, changing a branch > reference to anything other than to an object that points back to the > current value of that reference will fail unless --force is specified. > Changing references in a way that doesn't connect in this manner back > to the previous value of the branch can leave orphan (disconnected) > commit objects. Such objects can be reclaimed by garbage collection > in a matter of weeks. > > Additionally, reflogs can be created to track all changes to > references. This will enable you to use expressions such as > tag@{1.week.ago} to see what tags were pointed to at an arbitrary > point in time. > > All of this behavior is configurable. Change gc.pruneexpire to an > arbitrary large value to disable reclaiming of orphan values. Set > gc.auto to 0 to disable garbage collection. > > Git also has a large number of 'hooks' that can be used to trigger > things like sending of emails. I suggest that instead of disallowing > behaviors like forced pushes, we notify the appropriate people when > this was done, and maintain reflogs to enable such to be examined and > possibly undone. As with most things with git, how long reflogs are > retained is configurable (gc.reflogExpireUnreachable and > gc.reflogExpire). > > A few links: > > https://git-scm.com/docs/git-reflog > https://git-scm.com/docs/git-gc > > http://alblue.bandlem.com/2011/11/git-tip-of-week-gc-and-pruning-this.html > http://alblue.bandlem.com/2011/05/git-tip-of-week-reflogs.html > > At the bottom of this email is a script that will demonstrate that > orphaned objects still remain in the repository. > > - Sam Ruby > > > --David > > > > [1] > https://git-wip-us.apache.org/docs/switching-to-git.html#protected-ref-lists > > --- > > #!/bin/bash -x > > # > # Overview: > # > # * a git repository is a collection of commit objects and labels. These > # objects are linked into a directed acylic graph. > # > # * commit contents and metadata (including timestamps) are hashed to > produce > # commit names. > # > # * operations like 'rebase' mass produce new commit objects and move > # the label. > # > # * pushes (without --force) will fail unless the commit at the target > label > # is in the direct history of the commit being pushed as the new value > for > # this label. > # > # * pushes (with --force) won't delete old nodes, instead they will be > # orphaned and not deleted until garbage collection is run. gc can > # be disabled. > # > # * fsck can find and report on orphan nodes. reflogs can keep track > # of all nodes. > # > > rm -rf rewritedemo > mkdir rewritedemo > cd rewritedemo > > git init --bare origin.git > cd origin.git > git config core.logAllRefUpdates true > git config gc.auto 0 > cd .. > > echo > "**********************************************************************" > > git clone origin.git clone1 > cd clone1 > echo test1 > file > git add file > git commit -a -m test1 > git push > echo test2 > file > git commit -a -m test2 > git push > echo test2 > file > git log --pretty=raw > > echo > "**********************************************************************" > > cd ../origin.git > git log --pretty=raw > git fsck > > echo > "**********************************************************************" > > cd .. > git clone origin.git clone2 > cd clone2 > echo test3 > file > git commit --amend -a -m test3 > git push > git push --force > git log --pretty=raw > > echo > "**********************************************************************" > > cd ../origin.git > git log --pretty=raw > git fsck > git reflog show --all --pretty=raw > --001a1140c538a00c860523b6b014--