www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Thomerson <jer...@thomersonfamily.com>
Subject Re: @apache.org commit address requirement (Was: Git hosting is go)
Date Tue, 20 Dec 2011 20:22:56 GMT
On Tue, Dec 20, 2011 at 3:04 PM, Paul Davis <paul.joseph.davis@gmail.com>wrote:

> On Mon, Dec 19, 2011 at 5:15 PM, Jukka Zitting <jukka.zitting@gmail.com>
> wrote:
> > Hi,
> >
> > Perhaps we indeed should try discussing this on a higher level.
> >
> > I guess we all agree that in the push log we already have a mechanism
> > that can be used to trace each line of code in a repository back to a
> > CLA (or a software grant for the initial upload). This was the key
> > issue that needed to be solved to bring Git up to par with Subversion
> > in terms of code provenance.
> >
> The reflog is useful, but I don't think is the end of the story. For
> instance, what happens if an ASF committer works on a branch, and then
> a second ASF committer pushes those changes to master? The reflog
> would tell us that the first ASF committer introduced the code into
> the repository, but the second ASF committer is the person responsible
> for that code eventually going into a release which is (from what I
> understand) the more important of the two roles.

> At this point who gets blamed if something is wrong?

The guy who pushed it into the branch that is released, which is in this
case the "pusher", which I think corresponds to your "first ASF
committer".  This is the same as if I am doing development in a branch in
SVN and some random Joe grabs my work and merges it to trunk.  When someone
cuts a release it's his fault that it's in the release - not mine as the
guy who wrote it (somewhere where it wouldn't be released).

Where as, if we instead check that the Git commiter (%ce) field is
> person who should be cleared by a CLA then we don't have such
> questions. And we also don't have to concern ourselves for when the
> initial branch was developed outside of the canonical ASF repository
> (ie, not in the reflog).

But it still brings us back to some guy not being able to actually use the
distributed feature of the version control system because if he doesn't
have an ICLA he can not commit (into his own repo) and then ask us to pull
from that repo.

IMHO, anyone should be able to commit to any repo they want and push to any
repo they have access to (whether it's on their server, Github or similar
alternative sites), and then ask us to pull it.  If it's trivial, we can
pull without ICLA - the same as taking a trivial patch.  If it's
non-trivial, they need to either have an ICLA or submit it to JIRA and
check the box.  If it's major, ICLA and possibly more paperwork needs to be
in order.  But in all those cases it is up to the ASF project committer to
actually check this before pushing someone else's commits into ASF repo.

>  > The topic of this debate then turns more around extra rules for
> > enforcing that even non-committers who contribute a "large enough"
> > chunk of code should have a CLA on file. The questions then are a)
> > should we do this, and if yes, b) what's the best way to do it? Here's
> > my thinking:
> >
> > a) So far I haven't seen a compelling reason of why we should have
> > such rules, just vague references on how we can't trust people to do
> > the right thing. You asked for a discussion on the "resulting issues
> > of not having these checks". What issues are those? Why don't we have
> > them already with Subversion? The idea of such checks sounds useful,
> > but I'm not convinced that we need them in practice. That said, I'm
> > open to being convinced otherwise. Until that, -0.
> >
> If we don't have such checks and we don't change the existing
> policies, then we have to ensure that all committers understand that
> patches go through JIRA and must be applied with git am or similar.
> There's also the issue that Git patch sets aren't necessarily the same
> as SVN patches. Git best practices instruct people to write smaller
> patches in larger numbers so we'd also need to provide new rules of
> thumb in when a patch set is significantly non-trivial while
> reflecting on this issue.
> This isn't about not trusting people so much as our current policies
> don't allow people to do things they might otherwise assume are
> reasonable. For instance, your original motivation in this thread had
> to do with Pull Requests on GitHub. With our current policies, these
> are forbidden for all but the most trivial of patches (without a good
> definition for trivial in Git parlance). The entire motivation for
> having them was to enable participation with contributors using
> GitHub's model.
> > b) Assuming there is consensus on having these checks, I think using
> > %ce is reasonable starting point but not an ideal solution. At the
> > risk of bikeshedding, here's my alternative proposal:
> >
> > * I agree with your idea that a good way to approach this would be to
> > require extra scrutiny on all changes that come from people who don't
> > have a CLA on file. Asking a committer to explicitly "approve" such
> > changes (beyond the simple act committing/pushing them) seems like a
> > reasonable requirement. I also agree that such approval would be best
> > recorded in a commit message (even when doing so may break Git
> > history; that's a good incentive on getting the CLA filed).
> >
> > * I disagree that the %ce field is a good place for such information.
> > First, using it for this purpose overwrites existing information
> > recorded for a different purpose. Second, the %ce field always gets
> > set in a git commit/rebase/cherry-pick/etc. operation, regardless of
> > whether such "approval" of the contribution is indeed intended. Third,
> > %ce is a pretty low-level field that few people even know about (it
> > doesn't show up by default in "git log" and isn't visible on Github).
> >
> > * IMHO a better place for this information would be a Signed-off-by
> > line added to the commit message. The Signed-off-by convention was
> > explicitly created for this purpose [1] and is well supported by many
> > tools (starting with "git commit -s"). It's also a well-known and
> > -documented convention whose purpose practically everyone understands
> > (SCO...). Finally, such Signed-off-by lines are by default visible in
> > "git log" and on other tools like Github.
> >
> I'm not a big fan of managing this with the signed-off-by approach.
> While it does exist as a "standard" in some projects, it seems like
> its just a hack because they didn't assign the same legal weight to
> the committer field before it was too late.
> I agree that everyone understands it practically, not very many people
> use it in my experience. Some of the big projects like Linux and Git
> use it due to SCO issues. I think some tools like Gerrit insert it as
> well. Other than that I don't see it as a huge standard being picked
> up by more normal sized projects.
> Technical implementation wise, it's also a bit of a burden as its not
> an actual field in the commit message and has to be parsed out by hand
> AFAIK. Not insurmountable but its mildly annoying.
> I do see your point about accidentally updating commits with various
> commands that might not be the intended action. Though I think once
> people are going to push things towards the canonical repository then
> their intention should be pretty clear that they've checked and intend
> these commits for inclusion and have done their duty as required by
> the CLA.

Doesn't that mean that we're saying "yes, an ASF committer will do his due
diligence before pushing to ASF canonical repo"?  If so, why are we
imposing the technical barrier?

>  > * So, instead of checking the %ce field, my proposal would be a check
> > that verifies that either the recorded author or at least one of the
> > (possible multiple) signers-off of an incoming git commit has a CLA on
> > file. And for this to only require rewriting of commits ("git commit
> > --amend -s") when actually needed, the check should be against all the
> > recorded CLAs (ideally with possibly multiple email addresses per CLA)
> > instead of just the committers of a project. Thus an Apache committer
> > would only need to modify an incoming commit if its author doesn't
> > already have a CLA on file.
> >
> Minus the disagreement on the field we check, this is the goal I'm
> going for. Anyone with a CLA on file can contribute without requiring
> that the patch goes through JIRA. I'm open to revising which fields
> are checked, but it still seems most obvious that the committer (%ce
> field) is the person directly responsible for having cleared the code
> and hence is what should be checked.

If we are going to enforce a technical barrier for git pushes, I agree with
the above that it should be:
 * any email address on file for a CLA (not just @apache.org)
 * anyone in the chain of author/committer/signed off by

That being said, I still think that it's fine to just leave it as:

The guy who is pushing is authenticated as an ASF committer for this
project, and it's his responsibility to ensure that what he is pushing is
okay to be pushed - the same as it was his responsibility to make sure that
any code he committed to SVN was okay to commit.  I haven't seen any reason
for making this a technical restriction when it has been a social
restriction for years (social in that it's up to our society of ASF
committers to do the due diligence before committing code, not some commit
hook that checks it for them).  Sure, git gives us more metadata like
author/committer/etc, but IMO just because we have that doesn't mean that
we need to make new rules based on it.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message