Mailing-List: contact infrastructure-dev-help@apache.org; run by ezmlm
Precedence: bulk
Reply-To: infrastructure-dev@apache.org
Received-SPF: pass (nike.apache.org: domain of paul.joseph.davis@gmail.com
 designates 209.85.212.50 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOFYJNboAF=SfT-g67hvxL8wYsKgLC0inFxb=p0wRM2VBTK77g@mail.gmail.com>
References: 
 <CAOFYJNaM2b5yO=2TCM-qv2Sa9zN_pzYUCb+m91sFC=WD+y5QeA@mail.gmail.com>
 <CAJ_m3YBFLHhoeXn+_SYHCmMMFsL7osZFG=2d6RwSuuTSeYM5fg@mail.gmail.com>
 <CAOFYJNZJnFbL3-aXABN=30+w3-dyTz_rHy5RiN9cjJTHMCo79g@mail.gmail.com>
 <CAJ_m3YD-MZjRsKKvfhk8JddKBaxohY6kQEAdfVEmCuMEfqH0Bw@mail.gmail.com>
 <CAOFYJNbXDWKChTMG6CJCALxU3LTS3Xw9Sv2wSYCB2SSRFpkv-Q@mail.gmail.com>
 <CAJ_m3YACwsP5OFYdWA=grbySGCPVpoYdKPiL3ga93EijC0GjiA@mail.gmail.com>
 <CAOFYJNZz+WgXFuUEV+9e+g9HymR=5gTDwoxd_aOO-R_N2v4-9w@mail.gmail.com>
 <CAJ_m3YDeQ1=sXKG77F9Fokj=yA8=UfFX4o2_V_OHFh2vFJ+orQ@mail.gmail.com>
 <8FFCC1B8-280E-4003-90C2-24246AAA1EDD@gbiv.com>
 <CAJ_m3YBM2cu-a6ZxMfkmKb5CEXo8=dfnzSuZEZwqd1TY4q7ckw@mail.gmail.com>
 <CAJ_m3YAidYOMbB4SBNC-Z_Ut4fcH5vQ478RMqJC+03uTJnfHGw@mail.gmail.com>
 <CAOFYJNbekBScwi65yLrfe1VShf8EkhYxabOVoWVA4E1zk2ueNw@mail.gmail.com>
 <CAJ_m3YBfiyLNR_4rCa0HxXJfWRe3V4d3jESsonRu_w38gUk2-g@mail.gmail.com>
 <CAOFYJNboAF=SfT-g67hvxL8wYsKgLC0inFxb=p0wRM2VBTK77g@mail.gmail.com>
From: Paul Davis <paul.joseph.davis@gmail.com>
Date: Tue, 20 Dec 2011 14:04:50 -0600
Message-ID: 
 <CAJ_m3YCE1ia=_xhhVfr2pFnctvqBUXwRzko2A5-BJAnWKwAxoQ@mail.gmail.com>
Subject: Re: @apache.org commit address requirement (Was: Git hosting is go)
To: infrastructure-dev@apache.org
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Dec 19, 2011 at 5:15 PM, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> Hi,
>
> Perhaps we indeed should try discussing this on a higher level.
>
> I guess we all agree that in the push log we already have a mechanism
> that can be used to trace each line of code in a repository back to a
> CLA (or a software grant for the initial upload). This was the key
> issue that needed to be solved to bring Git up to par with Subversion
> in terms of code provenance.
>

The reflog is useful, but I don't think is the end of the story. For
instance, what happens if an ASF committer works on a branch, and then
a second ASF committer pushes those changes to master? The reflog
would tell us that the first ASF committer introduced the code into
the repository, but the second ASF committer is the person responsible
for that code eventually going into a release which is (from what I
understand) the more important of the two roles.

At this point who gets blamed if something is wrong?

Where as, if we instead check that the Git commiter (%ce) field is
person who should be cleared by a CLA then we don't have such
questions. And we also don't have to concern ourselves for when the
initial branch was developed outside of the canonical ASF repository
(ie, not in the reflog).

> The topic of this debate then turns more around extra rules for
> enforcing that even non-committers who contribute a "large enough"
> chunk of code should have a CLA on file. The questions then are a)
> should we do this, and if yes, b) what's the best way to do it? Here's
> my thinking:
>
> a) So far I haven't seen a compelling reason of why we should have
> such rules, just vague references on how we can't trust people to do
> the right thing. You asked for a discussion on the "resulting issues
> of not having these checks". What issues are those? Why don't we have
> them already with Subversion? The idea of such checks sounds useful,
> but I'm not convinced that we need them in practice. That said, I'm
> open to being convinced otherwise. Until that, -0.
>

If we don't have such checks and we don't change the existing
policies, then we have to ensure that all committers understand that
patches go through JIRA and must be applied with git am or similar.

There's also the issue that Git patch sets aren't necessarily the same
as SVN patches. Git best practices instruct people to write smaller
patches in larger numbers so we'd also need to provide new rules of
thumb in when a patch set is significantly non-trivial while
reflecting on this issue.

This isn't about not trusting people so much as our current policies
don't allow people to do things they might otherwise assume are
reasonable. For instance, your original motivation in this thread had
to do with Pull Requests on GitHub. With our current policies, these
are forbidden for all but the most trivial of patches (without a good
definition for trivial in Git parlance). The entire motivation for
having them was to enable participation with contributors using
GitHub's model.

> b) Assuming there is consensus on having these checks, I think using
> %ce is reasonable starting point but not an ideal solution. At the
> risk of bikeshedding, here's my alternative proposal:
>
> * I agree with your idea that a good way to approach this would be to
> require extra scrutiny on all changes that come from people who don't
> have a CLA on file. Asking a committer to explicitly "approve" such
> changes (beyond the simple act committing/pushing them) seems like a
> reasonable requirement. I also agree that such approval would be best
> recorded in a commit message (even when doing so may break Git
> history; that's a good incentive on getting the CLA filed).
>
> * I disagree that the %ce field is a good place for such information.
> First, using it for this purpose overwrites existing information
> recorded for a different purpose. Second, the %ce field always gets
> set in a git commit/rebase/cherry-pick/etc. operation, regardless of
> whether such "approval" of the contribution is indeed intended. Third,
> %ce is a pretty low-level field that few people even know about (it
> doesn't show up by default in "git log" and isn't visible on Github).
>
> * IMHO a better place for this information would be a Signed-off-by
> line added to the commit message. The Signed-off-by convention was
> explicitly created for this purpose [1] and is well supported by many
> tools (starting with "git commit -s"). It's also a well-known and
> -documented convention whose purpose practically everyone understands
> (SCO...). Finally, such Signed-off-by lines are by default visible in
> "git log" and on other tools like Github.
>

I'm not a big fan of managing this with the signed-off-by approach.
While it does exist as a "standard" in some projects, it seems like
its just a hack because they didn't assign the same legal weight to
the committer field before it was too late.

I agree that everyone understands it practically, not very many people
use it in my experience. Some of the big projects like Linux and Git
use it due to SCO issues. I think some tools like Gerrit insert it as
well. Other than that I don't see it as a huge standard being picked
up by more normal sized projects.

Technical implementation wise, it's also a bit of a burden as its not
an actual field in the commit message and has to be parsed out by hand
AFAIK. Not insurmountable but its mildly annoying.

I do see your point about accidentally updating commits with various
commands that might not be the intended action. Though I think once
people are going to push things towards the canonical repository then
their intention should be pretty clear that they've checked and intend
these commits for inclusion and have done their duty as required by
the CLA.

> * So, instead of checking the %ce field, my proposal would be a check
> that verifies that either the recorded author or at least one of the
> (possible multiple) signers-off of an incoming git commit has a CLA on
> file. And for this to only require rewriting of commits ("git commit
> --amend -s") when actually needed, the check should be against all the
> recorded CLAs (ideally with possibly multiple email addresses per CLA)
> instead of just the committers of a project. Thus an Apache committer
> would only need to modify an incoming commit if its author doesn't
> already have a CLA on file.
>

Minus the disagreement on the field we check, this is the goal I'm
going for. Anyone with a CLA on file can contribute without requiring
that the patch goes through JIRA. I'm open to revising which fields
are checked, but it still seems most obvious that the committer (%ce
field) is the person directly responsible for having cleared the code
and hence is what should be checked.

> [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/SubmittingPatches;hb=HEAD#l297
>
> BR,
>
> Jukka Zitting