www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Thomerson <jer...@thomersonfamily.com>
Subject Re: Git, history, protection, and other topics
Date Wed, 04 Nov 2015 13:06:58 GMT
Thank you for the excellent and accurate write-up Sam. I'm happy to see the
factualness and accuracy of your email because it helps avoid the
"rewriting is from the devil" mentality that sometimes pops up, primarily
among those who aren't day-to-day git power users.

If an analogy helps: with git you never really "rewrite history"... as Sam
points out. Instead, you just change which version of history is called by
a certain name. In other words, if the tip of my master branch was commit
deadbeef, and I "rewrite history" by rebasing, fixing up a commit, changing
the commit message, etc, and now force push a new commit abcdef1, both
commits - with absolutely all of their history - still exist (and will
until your GC settings prune them). It's just that the reference that I
call "master" now points to a different version of history (abcdef1) than
it did previously (deadbeef). It's like having two history books on the
shelf and swapping the covers on them - both versions of history are still
there, just called by different names.

In most places I work, we allow this sort of "rewriting" on
work-in-progress branches but always keep stable, no rewrite, only fast
forward branches around as our primary "truth of source" branch(es). That
model works fine for us. I think it could work for the ASF to, and allow
for proper code provenance, as long as projects are using a locked branch
as the branch they cut releases from.

In any event, until the board with the help of legal advice and in a way
that represents the members sets the definite policy on what's *required* for
code provenance, it's nearly impossible to have any technical discussions
about how to implement what's required. It's like writing software with no
requirements - which we all know is a bad idea.

Thanks!
Jeremy Thomerson

On Wed, Nov 4, 2015 at 7:43 AM, Sam Ruby <rubys@intertwingly.net> wrote:

> On Tue, Nov 3, 2015 at 11:08 PM, David Nalley <david@gnsa.us> wrote:
> > Hi folks,
> >
> > So earlier today I sent an email to PMCs@ indicating that we had
> > turned on disabled fast forward commits and branch/tag deletion across
> > all of the ASF git repositories. [1]
> >
> > The crux of the problem is that infrastructure had set the expectation
> > that certain branches and tags were protected from force pushes or
> > branch/tag deletion.
> >
> > It was recently discovered that a large number of our projects were
> > doing their main branch of development outside of these protected
> > branches, and not using the release branch and tag scheme that would
> > leave them protected.  Some, were using branches with names like
> > 'develop' while others had $project_foo.
> >
> > As a short-term, interim step to allow us to meet the expectation that
> > the main we blocked fast-forward pushes and branch/tag deletion until
> > we can figure out the best way to adequately address the situation.
> >
> > I don't know whether or not the situation is best addressed via policy
> > or technical means, but the discussion here is designed to discover
> > what that should look like, so that we can move past the admittedly
> > blunt, and likely disruptive measure that we introduced today.
> >
> > So; let the discussions begin.
>
> It would be helpful to start with some goals and/or rationale behind
> the current policy.
>
> I'll start with the assumption that "rewrite history" sounds scary.
> I'm going to make the case the term "rewrite history" isn't accurate.
>
> To start with, a git repository is a set of objects.  We will focus on
> commit objects.  Commits are identified by a hash of both content and
> metadata.  Change anything, and you have a new object with a new hash.
>   Push a new commit and you have both the old object and new object in
> the repository.
>
> Commit objects are organized in a directed acyclic graph.  Commit are
> located by references (such as tags).  By default, changing a branch
> reference to anything other than to an object that points back to the
> current value of that reference will fail unless --force is specified.
> Changing references in a way that doesn't connect in this manner back
> to the previous value of the branch can leave orphan (disconnected)
> commit objects.  Such objects can be reclaimed by garbage collection
> in a matter of weeks.
>
> Additionally, reflogs can be created to track all changes to
> references.  This will enable you to use expressions such as
> tag@{1.week.ago} to see what tags were pointed to at an arbitrary
> point in time.
>
> All of this behavior is configurable.  Change gc.pruneexpire to an
> arbitrary large value to disable reclaiming of orphan values.  Set
> gc.auto to 0 to disable garbage collection.
>
> Git also has a large number of 'hooks' that can be used to trigger
> things like sending of emails.  I suggest that instead of disallowing
> behaviors like forced pushes, we notify the appropriate people when
> this was done, and maintain reflogs to enable such to be examined and
> possibly undone.  As with most things with git, how long reflogs are
> retained is configurable (gc.reflogExpireUnreachable and
> gc.reflogExpire).
>
> A few links:
>
> https://git-scm.com/docs/git-reflog
> https://git-scm.com/docs/git-gc
>
> http://alblue.bandlem.com/2011/11/git-tip-of-week-gc-and-pruning-this.html
> http://alblue.bandlem.com/2011/05/git-tip-of-week-reflogs.html
>
> At the bottom of this email is a script that will demonstrate that
> orphaned objects still remain in the repository.
>
> - Sam Ruby
>
> > --David
> >
> > [1]
> https://git-wip-us.apache.org/docs/switching-to-git.html#protected-ref-lists
>
> ---
>
> #!/bin/bash -x
>
> #
> # Overview:
> #
> # * a git repository is a collection of commit objects and labels.  These
> #   objects are linked into a directed acylic graph.
> #
> # * commit contents and metadata (including timestamps) are hashed to
> produce
> #   commit names.
> #
> # * operations like 'rebase' mass produce new commit objects and move
> #   the label.
> #
> # * pushes (without --force) will fail unless the commit at the target
> label
> #   is in the direct history of the commit being pushed as the new value
> for
> #   this label.
> #
> # * pushes (with --force) won't delete old nodes, instead they will be
> #   orphaned and not deleted until garbage collection is run.  gc can
> #   be disabled.
> #
> # * fsck can find and report on orphan nodes.  reflogs can keep track
> #   of all nodes.
> #
>
> rm -rf rewritedemo
> mkdir rewritedemo
> cd rewritedemo
>
> git init --bare origin.git
> cd origin.git
> git config core.logAllRefUpdates true
> git config gc.auto 0
> cd ..
>
> echo
> "**********************************************************************"
>
> git clone origin.git clone1
> cd clone1
> echo test1 > file
> git add file
> git commit -a -m test1
> git push
> echo test2 > file
> git commit -a -m test2
> git push
> echo test2 > file
> git log --pretty=raw
>
> echo
> "**********************************************************************"
>
> cd ../origin.git
> git log --pretty=raw
> git fsck
>
> echo
> "**********************************************************************"
>
> cd ..
> git clone origin.git clone2
> cd clone2
> echo test3 > file
> git commit --amend -a -m test3
> git push
> git push --force
> git log --pretty=raw
>
> echo
> "**********************************************************************"
>
> cd ../origin.git
> git log --pretty=raw
> git fsck
> git reflog show --all --pretty=raw
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message