www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: @apache.org commit address requirement (Was: Git hosting is go)
Date Wed, 14 Dec 2011 22:24:18 GMT
On Wed, Dec 14, 2011 at 1:15 PM, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> Hi,
> On Wed, Dec 14, 2011 at 6:04 PM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
>> The ref updates are already logged with the change, the ldap
>> authorized id, time and ip of who pushed so that should be sufficient
>> there.
> That information is enough to determine who pushed each commit, but it
> could be rather tricky. For example, consider a case where a branch
> starts at commit X and then someone pushes it to X -> Y -> Z. This
> would be recorded as a change from X to Z for that branch. Now if
> someone else creates a new branch at Y and makes a push, from the
> ref-updates.log it'll look like that's when the commit entered the
> repository.

To be more concrete here, an actual ref-update.log entry looks like this:

[Wed Dec 14 05:55:16 2011] refs/heads/master 0658d982e4 -> 2d90a1249d

And a branch creation looks like this:

[Sun Nov 27 01:08:05 2011] refs/heads/COUCHDB-431 0000000000 ->
8de5e722ea randall@http.

If I read your case right, then the confusion would be when 8de5e722ea
already exists in the repository.

To say that this makes it appear that 8de5e722ea didn't exist in the
repository I think is juts confusion on what we're logging. These are
just ref updates which contain all the required information to rebuild
when commits entered the repository.

I do agree though that mapping back to when things were uploaded would
take more effort on the analysis side but the trade off is that the
server hooks get a bit more complicated. In this particular case I'd
lean towards keeping the server side as simple and robust as possible
and then if anyone needs to do the analysis of what commits were
pushed at any given point they can do the (somewhat less trivial) log

> Cases like that could be resolved by looking at the full commit graph
> and the timestamps in ref-updates.log, but it would be much easier if
> each push simply recorded the hash of each new commit that got
> uploaded by that user.

I'm not sure I agree with the characterization of much here. It'd be a
relatively simple script to read lines from this and check for commits
that didn't already exist in the repository. On the other hand, moving
to the log entry per commit means that we have to play around with how
we log things a bit. The obvious way is one line per commit sha, but
then we're losing the relation of the update. Or we could log the
commits and then a line for the update, but then the log has multiple
times of records.

> Such a log of pushed commits would also contain the information that I
> believe the current intention is to have in the %ce entries.

I don't think this is the same thing. The ref-updates.log contains who
updated what refs during a push. This is separate from who made the
commit. Ie, the point of checking %ce in the commit is to check that a
commit was entered by a project committer but this importantly not the
same as who's making the push. Ie, this allows a committer to push
updates that include commits by other committers.

>> As to the committer email checks, I was a bit uncertain when
>> implementing this part. One the one hand I wanted to be very specific
>> that every commit was committed by someone acceptable for the project
>> but this has the already pointed out draw back that it requires
>> committers to rebase commits after reviewing them which doesn't work
>> so hot with some peoples' work flow.
> Right. That's why I'd like to drop that check, or at least make it
> configurable per repository.
> There's also the problem that overriding the %ce field in a rebase or
> an explicit filter-branch operation could in some cases destroy useful
> information (distinction between who wrote the change and who made the
> original git commit), which wouldn't be a problem for the separate
> commit log or the Signed-Off-By entries I mentioned.

I'm not sure I follow. The %ae field should always be the person that
wrote the commit. The %ce is who made the commit. As to Signed-Off-By,
I'm purposefully avoiding any use of that in the hooks and leaving
it's use open to individual projects to use as they see fit.

>> In a perfect world what I'd really like to see here is the ldap
>> self-service have email fields and an ICLA/CCLA check boolean so that
>> we could just check that the email address is connected to a CLA of
>> some sort. I'm quite hesitant to remove all checks from this part
>> because it's quite easy for those fields to be garbage when someone
>> forgets to set their name or if someone accidentally pulls unreviewed
>> code in a fat finger branch update.
> I'd treat those as social problems that are best solved by social
> means. Introducing technical barriers will just force people to use
> awkward workarounds when things like pull requests don't work like
> they'd expect. After all, there are plenty of cases where people have
> accidentally committed extra files or other garbage with Subversion. I
> don't see much difference here.
> I wouldn't object to having the %ce check available as an option for
> projects that want it or even enabled by default, but IMHO it
> shouldn't be mandatory for all projects.
> BR,
> Jukka Zitting

I would argue quite the opposite. Specifically because of things like
pull requests we should be going the extra mile here to try and
prevent random garbage from entering the repository. I do understand
that the %ce check is a bit clunky, but I can already say that it's
prevented commits into CouchDB that would be at best an eyesore and at
worst a provenance problem.

I don't think the way to make this better is to throw away the check,
but to make the check better by integrating with id.apache.org and
setting it up so that people can submit ICLA/CCLA's online.

View raw message