Return-Path: X-Original-To: apmail-infrastructure-dev-archive@minotaur.apache.org Delivered-To: apmail-infrastructure-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 759A29517 for ; Tue, 20 Dec 2011 20:06:00 +0000 (UTC) Received: (qmail 31365 invoked by uid 500); 20 Dec 2011 20:06:00 -0000 Delivered-To: apmail-infrastructure-dev-archive@apache.org Received: (qmail 31195 invoked by uid 500); 20 Dec 2011 20:06:00 -0000 Mailing-List: contact infrastructure-dev-help@apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: infrastructure-dev@apache.org Delivered-To: mailing list infrastructure-dev@apache.org Received: (qmail 31187 invoked by uid 99); 20 Dec 2011 20:06:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Dec 2011 20:06:00 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.212.50 as permitted sender) Received: from [209.85.212.50] (HELO mail-vw0-f50.google.com) (209.85.212.50) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Dec 2011 20:05:52 +0000 Received: by vbbey12 with SMTP id ey12so6698821vbb.23 for ; Tue, 20 Dec 2011 12:05:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=u+XUdoOTUBxNsUUkP27H6NgMUkF5yv7+/C7CeHyycZo=; b=PF4ZugVhQqhcqDDFxKTHZGVP0U8IYdHJ9qWhQU/1f9TH2mdI5SNdjD21oM93jTln26 Zm3aBu/Df4u4pktU6us7uc1Nl3YEaBp+iKk60tdup0KCYZ/TGK5f9EQZWKlAAzWiq/J7 RnLwNsxuhBt87v7gm0/XIkj/zv7i3aGsF2s8Q= Received: by 10.52.89.34 with SMTP id bl2mr2182512vdb.116.1324411531203; Tue, 20 Dec 2011 12:05:31 -0800 (PST) MIME-Version: 1.0 Received: by 10.220.190.202 with HTTP; Tue, 20 Dec 2011 12:04:50 -0800 (PST) In-Reply-To: References: <8FFCC1B8-280E-4003-90C2-24246AAA1EDD@gbiv.com> From: Paul Davis Date: Tue, 20 Dec 2011 14:04:50 -0600 Message-ID: Subject: Re: @apache.org commit address requirement (Was: Git hosting is go) To: infrastructure-dev@apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Dec 19, 2011 at 5:15 PM, Jukka Zitting wrote: > Hi, > > Perhaps we indeed should try discussing this on a higher level. > > I guess we all agree that in the push log we already have a mechanism > that can be used to trace each line of code in a repository back to a > CLA (or a software grant for the initial upload). This was the key > issue that needed to be solved to bring Git up to par with Subversion > in terms of code provenance. > The reflog is useful, but I don't think is the end of the story. For instance, what happens if an ASF committer works on a branch, and then a second ASF committer pushes those changes to master? The reflog would tell us that the first ASF committer introduced the code into the repository, but the second ASF committer is the person responsible for that code eventually going into a release which is (from what I understand) the more important of the two roles. At this point who gets blamed if something is wrong? Where as, if we instead check that the Git commiter (%ce) field is person who should be cleared by a CLA then we don't have such questions. And we also don't have to concern ourselves for when the initial branch was developed outside of the canonical ASF repository (ie, not in the reflog). > The topic of this debate then turns more around extra rules for > enforcing that even non-committers who contribute a "large enough" > chunk of code should have a CLA on file. The questions then are a) > should we do this, and if yes, b) what's the best way to do it? Here's > my thinking: > > a) So far I haven't seen a compelling reason of why we should have > such rules, just vague references on how we can't trust people to do > the right thing. You asked for a discussion on the "resulting issues > of not having these checks". What issues are those? Why don't we have > them already with Subversion? The idea of such checks sounds useful, > but I'm not convinced that we need them in practice. That said, I'm > open to being convinced otherwise. Until that, -0. > If we don't have such checks and we don't change the existing policies, then we have to ensure that all committers understand that patches go through JIRA and must be applied with git am or similar. There's also the issue that Git patch sets aren't necessarily the same as SVN patches. Git best practices instruct people to write smaller patches in larger numbers so we'd also need to provide new rules of thumb in when a patch set is significantly non-trivial while reflecting on this issue. This isn't about not trusting people so much as our current policies don't allow people to do things they might otherwise assume are reasonable. For instance, your original motivation in this thread had to do with Pull Requests on GitHub. With our current policies, these are forbidden for all but the most trivial of patches (without a good definition for trivial in Git parlance). The entire motivation for having them was to enable participation with contributors using GitHub's model. > b) Assuming there is consensus on having these checks, I think using > %ce is reasonable starting point but not an ideal solution. At the > risk of bikeshedding, here's my alternative proposal: > > * I agree with your idea that a good way to approach this would be to > require extra scrutiny on all changes that come from people who don't > have a CLA on file. Asking a committer to explicitly "approve" such > changes (beyond the simple act committing/pushing them) seems like a > reasonable requirement. I also agree that such approval would be best > recorded in a commit message (even when doing so may break Git > history; that's a good incentive on getting the CLA filed). > > * I disagree that the %ce field is a good place for such information. > First, using it for this purpose overwrites existing information > recorded for a different purpose. Second, the %ce field always gets > set in a git commit/rebase/cherry-pick/etc. operation, regardless of > whether such "approval" of the contribution is indeed intended. Third, > %ce is a pretty low-level field that few people even know about (it > doesn't show up by default in "git log" and isn't visible on Github). > > * IMHO a better place for this information would be a Signed-off-by > line added to the commit message. The Signed-off-by convention was > explicitly created for this purpose [1] and is well supported by many > tools (starting with "git commit -s"). It's also a well-known and > -documented convention whose purpose practically everyone understands > (SCO...). Finally, such Signed-off-by lines are by default visible in > "git log" and on other tools like Github. > I'm not a big fan of managing this with the signed-off-by approach. While it does exist as a "standard" in some projects, it seems like its just a hack because they didn't assign the same legal weight to the committer field before it was too late. I agree that everyone understands it practically, not very many people use it in my experience. Some of the big projects like Linux and Git use it due to SCO issues. I think some tools like Gerrit insert it as well. Other than that I don't see it as a huge standard being picked up by more normal sized projects. Technical implementation wise, it's also a bit of a burden as its not an actual field in the commit message and has to be parsed out by hand AFAIK. Not insurmountable but its mildly annoying. I do see your point about accidentally updating commits with various commands that might not be the intended action. Though I think once people are going to push things towards the canonical repository then their intention should be pretty clear that they've checked and intend these commits for inclusion and have done their duty as required by the CLA. > * So, instead of checking the %ce field, my proposal would be a check > that verifies that either the recorded author or at least one of the > (possible multiple) signers-off of an incoming git commit has a CLA on > file. And for this to only require rewriting of commits ("git commit > --amend -s") when actually needed, the check should be against all the > recorded CLAs (ideally with possibly multiple email addresses per CLA) > instead of just the committers of a project. Thus an Apache committer > would only need to modify an incoming commit if its author doesn't > already have a CLA on file. > Minus the disagreement on the field we check, this is the goal I'm going for. Anyone with a CLA on file can contribute without requiring that the patch goes through JIRA. I'm open to revising which fields are checked, but it still seems most obvious that the committer (%ce field) is the person directly responsible for having cleared the code and hence is what should be checked. > [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/SubmittingPatches;hb=HEAD#l297 > > BR, > > Jukka Zitting