www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: @apache.org commit address requirement (Was: Git hosting is go)
Date Tue, 20 Dec 2011 23:48:14 GMT
On Tue, Dec 20, 2011 at 2:22 PM, Jeremy Thomerson
<jeremy@thomersonfamily.com> wrote:
> On Tue, Dec 20, 2011 at 3:04 PM, Paul Davis <paul.joseph.davis@gmail.com>wrote:
>> On Mon, Dec 19, 2011 at 5:15 PM, Jukka Zitting <jukka.zitting@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > Perhaps we indeed should try discussing this on a higher level.
>> >
>> > I guess we all agree that in the push log we already have a mechanism
>> > that can be used to trace each line of code in a repository back to a
>> > CLA (or a software grant for the initial upload). This was the key
>> > issue that needed to be solved to bring Git up to par with Subversion
>> > in terms of code provenance.
>> >
>> The reflog is useful, but I don't think is the end of the story. For
>> instance, what happens if an ASF committer works on a branch, and then
>> a second ASF committer pushes those changes to master? The reflog
>> would tell us that the first ASF committer introduced the code into
>> the repository, but the second ASF committer is the person responsible
>> for that code eventually going into a release which is (from what I
>> understand) the more important of the two roles.
>> At this point who gets blamed if something is wrong?
> The guy who pushed it into the branch that is released, which is in this
> case the "pusher", which I think corresponds to your "first ASF
> committer".  This is the same as if I am doing development in a branch in
> SVN and some random Joe grabs my work and merges it to trunk.  When someone
> cuts a release it's his fault that it's in the release - not mine as the
> guy who wrote it (somewhere where it wouldn't be released).

No, that'd be the second ASF committer that pulled commits into
master. The fact that there's confusion on who's who only serves to
underscore the main point that the whole thing is confusing, so why
not validate CLA's at the commit level?

> Where as, if we instead check that the Git commiter (%ce) field is
>> person who should be cleared by a CLA then we don't have such
>> questions. And we also don't have to concern ourselves for when the
>> initial branch was developed outside of the canonical ASF repository
>> (ie, not in the reflog).
> But it still brings us back to some guy not being able to actually use the
> distributed feature of the version control system because if he doesn't
> have an ICLA he can not commit (into his own repo) and then ask us to pull
> from that repo.

I have no idea what you mean here. Any external contributor can do
whatever they want to their git repo and ask people to pull from
whatever repo and do whatever their hearts desire. What this *does* do
is check that when an ASF committer pushes to the ASF canonical repo
that the commits are tied to a CLA. The current system bases this off
the %ce field. This means that to "sign off" on a commit, someone
cherry-pick'ed or otherwise updated the committer field to reflect
that that ASF committer has cleared the IP as per ASF requirements.

> IMHO, anyone should be able to commit to any repo they want and push to any
> repo they have access to (whether it's on their server, Github or similar
> alternative sites), and then ask us to pull it.  If it's trivial, we can
> pull without ICLA - the same as taking a trivial patch.  If it's
> non-trivial, they need to either have an ICLA or submit it to JIRA and
> check the box.  If it's major, ICLA and possibly more paperwork needs to be
> in order.  But in all those cases it is up to the ASF project committer to
> actually check this before pushing someone else's commits into ASF repo.

You keep talking about pulling and it confuses me. These checks are
not based on pulling commits to a local repo. They only come into
effect when the commit is pushed to the ASF canonical repo.

>>  > The topic of this debate then turns more around extra rules for
>> > enforcing that even non-committers who contribute a "large enough"
>> > chunk of code should have a CLA on file. The questions then are a)
>> > should we do this, and if yes, b) what's the best way to do it? Here's
>> > my thinking:
>> >
>> > a) So far I haven't seen a compelling reason of why we should have
>> > such rules, just vague references on how we can't trust people to do
>> > the right thing. You asked for a discussion on the "resulting issues
>> > of not having these checks". What issues are those? Why don't we have
>> > them already with Subversion? The idea of such checks sounds useful,
>> > but I'm not convinced that we need them in practice. That said, I'm
>> > open to being convinced otherwise. Until that, -0.
>> >
>> If we don't have such checks and we don't change the existing
>> policies, then we have to ensure that all committers understand that
>> patches go through JIRA and must be applied with git am or similar.
>> There's also the issue that Git patch sets aren't necessarily the same
>> as SVN patches. Git best practices instruct people to write smaller
>> patches in larger numbers so we'd also need to provide new rules of
>> thumb in when a patch set is significantly non-trivial while
>> reflecting on this issue.
>> This isn't about not trusting people so much as our current policies
>> don't allow people to do things they might otherwise assume are
>> reasonable. For instance, your original motivation in this thread had
>> to do with Pull Requests on GitHub. With our current policies, these
>> are forbidden for all but the most trivial of patches (without a good
>> definition for trivial in Git parlance). The entire motivation for
>> having them was to enable participation with contributors using
>> GitHub's model.
>> > b) Assuming there is consensus on having these checks, I think using
>> > %ce is reasonable starting point but not an ideal solution. At the
>> > risk of bikeshedding, here's my alternative proposal:
>> >
>> > * I agree with your idea that a good way to approach this would be to
>> > require extra scrutiny on all changes that come from people who don't
>> > have a CLA on file. Asking a committer to explicitly "approve" such
>> > changes (beyond the simple act committing/pushing them) seems like a
>> > reasonable requirement. I also agree that such approval would be best
>> > recorded in a commit message (even when doing so may break Git
>> > history; that's a good incentive on getting the CLA filed).
>> >
>> > * I disagree that the %ce field is a good place for such information.
>> > First, using it for this purpose overwrites existing information
>> > recorded for a different purpose. Second, the %ce field always gets
>> > set in a git commit/rebase/cherry-pick/etc. operation, regardless of
>> > whether such "approval" of the contribution is indeed intended. Third,
>> > %ce is a pretty low-level field that few people even know about (it
>> > doesn't show up by default in "git log" and isn't visible on Github).
>> >
>> > * IMHO a better place for this information would be a Signed-off-by
>> > line added to the commit message. The Signed-off-by convention was
>> > explicitly created for this purpose [1] and is well supported by many
>> > tools (starting with "git commit -s"). It's also a well-known and
>> > -documented convention whose purpose practically everyone understands
>> > (SCO...). Finally, such Signed-off-by lines are by default visible in
>> > "git log" and on other tools like Github.
>> >
>> I'm not a big fan of managing this with the signed-off-by approach.
>> While it does exist as a "standard" in some projects, it seems like
>> its just a hack because they didn't assign the same legal weight to
>> the committer field before it was too late.
>> I agree that everyone understands it practically, not very many people
>> use it in my experience. Some of the big projects like Linux and Git
>> use it due to SCO issues. I think some tools like Gerrit insert it as
>> well. Other than that I don't see it as a huge standard being picked
>> up by more normal sized projects.
>> Technical implementation wise, it's also a bit of a burden as its not
>> an actual field in the commit message and has to be parsed out by hand
>> AFAIK. Not insurmountable but its mildly annoying.
>> I do see your point about accidentally updating commits with various
>> commands that might not be the intended action. Though I think once
>> people are going to push things towards the canonical repository then
>> their intention should be pretty clear that they've checked and intend
>> these commits for inclusion and have done their duty as required by
>> the CLA.
> Doesn't that mean that we're saying "yes, an ASF committer will do his due
> diligence before pushing to ASF canonical repo"?  If so, why are we
> imposing the technical barrier?

Yes and I don't know where the second question comes from. Words are
getting a bit overloaded for me to follow your intention.

Yes, ASF committers must do their due diligence. But unless someone
has changed policy about patches going through JIRA then this isn't a
technical barrier at all because by definition the commit will have
the ASF committers email address as the %ce field and all this does is
tell people when they're breaking policy.

>>  > * So, instead of checking the %ce field, my proposal would be a check
>> > that verifies that either the recorded author or at least one of the
>> > (possible multiple) signers-off of an incoming git commit has a CLA on
>> > file. And for this to only require rewriting of commits ("git commit
>> > --amend -s") when actually needed, the check should be against all the
>> > recorded CLAs (ideally with possibly multiple email addresses per CLA)
>> > instead of just the committers of a project. Thus an Apache committer
>> > would only need to modify an incoming commit if its author doesn't
>> > already have a CLA on file.
>> >
>> Minus the disagreement on the field we check, this is the goal I'm
>> going for. Anyone with a CLA on file can contribute without requiring
>> that the patch goes through JIRA. I'm open to revising which fields
>> are checked, but it still seems most obvious that the committer (%ce
>> field) is the person directly responsible for having cleared the code
>> and hence is what should be checked.
> If we are going to enforce a technical barrier for git pushes, I agree with
> the above that it should be:
>  * any email address on file for a CLA (not just @apache.org)
>  * anyone in the chain of author/committer/signed off by
> That being said, I still think that it's fine to just leave it as:
> The guy who is pushing is authenticated as an ASF committer for this
> project, and it's his responsibility to ensure that what he is pushing is
> okay to be pushed - the same as it was his responsibility to make sure that
> any code he committed to SVN was okay to commit.  I haven't seen any reason
> for making this a technical restriction when it has been a social
> restriction for years (social in that it's up to our society of ASF
> committers to do the due diligence before committing code, not some commit
> hook that checks it for them).  Sure, git gives us more metadata like
> author/committer/etc, but IMO just because we have that doesn't mean that
> we need to make new rules based on it.
> Jeremy

Once again I'm going to point out that current patches must move
through JIRA. Assuming people follow this policy then the %ce field is
by definition an ASF committer on the Apache project in question. Full

As it currently stands, if people attempt to push something that
doesn't meet this check then they have either misconfigured their
client or they're breaking policy (or they're pushing trivial patches
which, as the name suggests, are trivial to fix). Given this, why
would we *not* want a check here.

Now obviously, this makes Git a bit limiting to people that are used
to using it extensively as a platform for trading patches. In order to
allow such use it has been planned for quite some time to ease the
registration of an ICLA (hopefully even digitally) and an email
address that we can use to clear commits automatically.

View raw message