www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Ruby <ru...@intertwingly.net>
Subject Re: Git, history, protection, and other topics
Date Wed, 04 Nov 2015 12:43:02 GMT
On Tue, Nov 3, 2015 at 11:08 PM, David Nalley <david@gnsa.us> wrote:
> Hi folks,
>
> So earlier today I sent an email to PMCs@ indicating that we had
> turned on disabled fast forward commits and branch/tag deletion across
> all of the ASF git repositories. [1]
>
> The crux of the problem is that infrastructure had set the expectation
> that certain branches and tags were protected from force pushes or
> branch/tag deletion.
>
> It was recently discovered that a large number of our projects were
> doing their main branch of development outside of these protected
> branches, and not using the release branch and tag scheme that would
> leave them protected.  Some, were using branches with names like
> 'develop' while others had $project_foo.
>
> As a short-term, interim step to allow us to meet the expectation that
> the main we blocked fast-forward pushes and branch/tag deletion until
> we can figure out the best way to adequately address the situation.
>
> I don't know whether or not the situation is best addressed via policy
> or technical means, but the discussion here is designed to discover
> what that should look like, so that we can move past the admittedly
> blunt, and likely disruptive measure that we introduced today.
>
> So; let the discussions begin.

It would be helpful to start with some goals and/or rationale behind
the current policy.

I'll start with the assumption that "rewrite history" sounds scary.
I'm going to make the case the term "rewrite history" isn't accurate.

To start with, a git repository is a set of objects.  We will focus on
commit objects.  Commits are identified by a hash of both content and
metadata.  Change anything, and you have a new object with a new hash.
  Push a new commit and you have both the old object and new object in
the repository.

Commit objects are organized in a directed acyclic graph.  Commit are
located by references (such as tags).  By default, changing a branch
reference to anything other than to an object that points back to the
current value of that reference will fail unless --force is specified.
Changing references in a way that doesn't connect in this manner back
to the previous value of the branch can leave orphan (disconnected)
commit objects.  Such objects can be reclaimed by garbage collection
in a matter of weeks.

Additionally, reflogs can be created to track all changes to
references.  This will enable you to use expressions such as
tag@{1.week.ago} to see what tags were pointed to at an arbitrary
point in time.

All of this behavior is configurable.  Change gc.pruneexpire to an
arbitrary large value to disable reclaiming of orphan values.  Set
gc.auto to 0 to disable garbage collection.

Git also has a large number of 'hooks' that can be used to trigger
things like sending of emails.  I suggest that instead of disallowing
behaviors like forced pushes, we notify the appropriate people when
this was done, and maintain reflogs to enable such to be examined and
possibly undone.  As with most things with git, how long reflogs are
retained is configurable (gc.reflogExpireUnreachable and
gc.reflogExpire).

A few links:

https://git-scm.com/docs/git-reflog
https://git-scm.com/docs/git-gc

http://alblue.bandlem.com/2011/11/git-tip-of-week-gc-and-pruning-this.html
http://alblue.bandlem.com/2011/05/git-tip-of-week-reflogs.html

At the bottom of this email is a script that will demonstrate that
orphaned objects still remain in the repository.

- Sam Ruby

> --David
>
> [1] https://git-wip-us.apache.org/docs/switching-to-git.html#protected-ref-lists

---

#!/bin/bash -x

#
# Overview:
#
# * a git repository is a collection of commit objects and labels.  These
#   objects are linked into a directed acylic graph.
#
# * commit contents and metadata (including timestamps) are hashed to produce
#   commit names.
#
# * operations like 'rebase' mass produce new commit objects and move
#   the label.
#
# * pushes (without --force) will fail unless the commit at the target label
#   is in the direct history of the commit being pushed as the new value for
#   this label.
#
# * pushes (with --force) won't delete old nodes, instead they will be
#   orphaned and not deleted until garbage collection is run.  gc can
#   be disabled.
#
# * fsck can find and report on orphan nodes.  reflogs can keep track
#   of all nodes.
#

rm -rf rewritedemo
mkdir rewritedemo
cd rewritedemo

git init --bare origin.git
cd origin.git
git config core.logAllRefUpdates true
git config gc.auto 0
cd ..

echo "**********************************************************************"

git clone origin.git clone1
cd clone1
echo test1 > file
git add file
git commit -a -m test1
git push
echo test2 > file
git commit -a -m test2
git push
echo test2 > file
git log --pretty=raw

echo "**********************************************************************"

cd ../origin.git
git log --pretty=raw
git fsck

echo "**********************************************************************"

cd ..
git clone origin.git clone2
cd clone2
echo test3 > file
git commit --amend -a -m test3
git push
git push --force
git log --pretty=raw

echo "**********************************************************************"

cd ../origin.git
git log --pretty=raw
git fsck
git reflog show --all --pretty=raw

Mime
View raw message