Return-Path: X-Original-To: apmail-incubator-ooo-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-ooo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D40C09E08 for ; Mon, 19 Sep 2011 18:05:28 +0000 (UTC) Received: (qmail 28029 invoked by uid 500); 19 Sep 2011 18:05:28 -0000 Delivered-To: apmail-incubator-ooo-dev-archive@incubator.apache.org Received: (qmail 27990 invoked by uid 500); 19 Sep 2011 18:05:28 -0000 Mailing-List: contact ooo-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: ooo-dev@incubator.apache.org Delivered-To: mailing list ooo-dev@incubator.apache.org Received: (qmail 27981 invoked by uid 99); 19 Sep 2011 18:05:28 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Sep 2011 18:05:28 +0000 Received: from localhost (HELO mail-vw0-f41.google.com) (127.0.0.1) (smtp-auth username robweir, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Sep 2011 18:05:28 +0000 Received: by vwm42 with SMTP id 42so11307291vwm.0 for ; Mon, 19 Sep 2011 11:05:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.201.194 with SMTP id fb2mr670607vcb.216.1316455527173; Mon, 19 Sep 2011 11:05:27 -0700 (PDT) Received: by 10.220.97.144 with HTTP; Mon, 19 Sep 2011 11:05:27 -0700 (PDT) In-Reply-To: <4E777995.1070201@wtnet.de> References: <4E7731E4.1000008@wtnet.de> <00a901cc76ea$18650440$492f0cc0$@acm.org> <4E777995.1070201@wtnet.de> Date: Mon, 19 Sep 2011 14:05:27 -0400 Message-ID: Subject: Re: A systematic approach to IP review? From: Rob Weir To: ooo-dev@incubator.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Mon, Sep 19, 2011 at 1:19 PM, Marcus (OOo) wrote: > Am 09/19/2011 06:54 PM, schrieb Rob Weir: >> >> On Mon, Sep 19, 2011 at 12:35 PM, Dennis E. Hamilton >> =C2=A0wrote: >>> >>> Rob, >>> >>> I was reading the suggestion from Marcus as it being that since the cod= e >>> base is in a folder structure (modularized) and the wiki can map folder >>> structures and their status nicely, it is not necessary to have a singl= e >>> table to manage this from, but have any tables be at some appropriate >>> granularity toward the leaves of the hierarchy (on the wiki). >>> >> >> Using the wiki for this might be useful for tracking the status of >> modules we already know we need to replace. =C2=A0Bugzilla would be anot= her >> way to track the status. > > How do you want to use Bugzilla to track thousands of files? > No. But for tracking module review, Bugzilla might be better than the wiki. It allows us to have a conversation on each module via comments. >> But it is not really a sufficient solution. =C2=A0Why? =C2=A0Because it = is not >> tied to the code and is not reproducible. =C2=A0How was the list of >> components listed in the wiki generated? =C2=A0Based on what script? =C2= =A0Where >> is the script? =C2=A0How do we know it is accurate and current? =C2=A0Ho= w do we >> know that integrating a CWS does not make that list become outdated? >> How do we prove to ourselves that we did this right? =C2=A0And how to we >> record that proof as a record? =C2=A0And how do we repeat this proof eve= ry >> time we do a new release? > > Questions over questions but not helpful. ;-) > > A list of components of unknown derivation sitting on a community wiki >> that anyone can edit is not really a suitable basis for an IP review. > > Then restrict the write access. > >> The granularity we need to worry about is the file. =C2=A0That is the >> finest grain level of having a license header. =C2=A0That is the unit of >> tracking in SVN. =C2=A0That is the unit that someone could have changed = the >> content in SVN. >> >> Again, it is fine if someone wants to outline this at the module >> level. =C2=A0But that does not eliminate the requirement for us to do th= is >> at the file level as well. > > IMHO you haven't understood what I wanted to tell you. > I understand what you are saying. I just don't agree with you. > Sure it makes no sense to create a list of every file in SVN to see if th= e > license is good or bad. So, do it module by module. And when a module is > marked as "done", then of course every file in the modules was checked. > Otherwise it's not working. > That is not a consistent approach. Every developer applies their own criteria. It is not reproducible. It leaves no audit trail. And it doesn't help us with the next release. If you use the Apache Release Audit Tool (RAT) then it will check all the files automatically. > And how to make sure that there was no change when source was > added/moved/improved? Simply Commit Then Review (CTR). A change in the > license header at the beginning should be remarkable, right? However, we > also need to have trust in everybodies work. > We would run RAT before every release and with every significant code contribution. You can think of this as a form of CTR, but one that is automated, with a consistent rule set. Obviously, good CTR plus the work on the wiki will all help. But we need the RAT scans as well, to show that we're clean. > BTW: > What is your plan to track every file to make sure the license is OK? > Run RAT. That is what it does. > Marcus > > > >>> I can see some brittle cases, especially in the face of refactoring. = =C2=A0The >>> use of the wiki might have to be an ephemeral activity that is handled = this >>> way entirely for our initial scrubbing. >>> >>> Ideally, additional and sustained review would be in the SVN with the >>> artifacts so reviewed, and coalesced somehow. =C2=A0The use of SVN prop= erties is >>> interesting, but they are rather invisible and I have a question about = what >>> happens with them when a commit happens against the particular artifact= . >>> >> >> Properties stick with the file, unless changed. =C2=A0Think of the >> svn:eol-style property. =C2=A0It is not wiped out with a new revision of >> the file. >> >>> It seems that there is some need to balance an immediate requirement an= d >>> what would be sufficient for it versus what would assist us in the long= er >>> term. =C2=A0It would be interesting to know what the additional-review = work has >>> become for other projects that have a substantial code base (e.g., SVN >>> itself, httpd, ...). =C2=A0I have no idea. >>> >> >> The IP review needs to occur with every release. =C2=A0So the work we do= to >> automate this, and make it data-drive, will repay itself with every >> release. >> >> I invite you to investigate what other projects do. =C2=A0When you do I >> think you will agree. >> >>> =C2=A0- Dennis >>> >>> -----Original Message----- >>> From: Rob Weir [mailto:robweir@apache.org] >>> Sent: Monday, September 19, 2011 07:47 >>> To: ooo-dev@incubator.apache.org >>> Subject: Re: A systematic approach to IP review? >>> >>> On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo) >>> =C2=A0wrote: >>>> >>>> Am 09/19/2011 01:59 PM, schrieb Rob Weir: >>>>> >>>>> 2011/9/19 J=C3=BCrgen Schmidt: >>>>>> >>>>>> On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir >>>>>> =C2=A0wrote: >>>>>> >>>>>>> If you haven't looked it closely, it is probably worth a few minute= s >>>>>>> of your time to review our incubation status page, especially the >>>>>>> items under "Copyright" and "Verify Distribution Rights". =C2=A0It = lists >>>>>>> the things we need to do, including: >>>>>>> >>>>>>> =C2=A0-- Check and make sure that the papers that transfer rights t= o the >>>>>>> ASF been received. It is only necessary to transfer rights for the >>>>>>> package, the core code, and any new code produced by the project. >>>>>>> >>>>>>> -- Check and make sure that the files that have been donated have >>>>>>> been >>>>>>> updated to reflect the new ASF copyright. >>>>>>> >>>>>>> -- Check and make sure that for all code included with the >>>>>>> distribution that is not under the Apache license, we have the righ= t >>>>>>> to combine with Apache-licensed code and redistribute. >>>>>>> >>>>>>> -- Check and make sure that all source code distributed by the >>>>>>> project >>>>>>> is covered by one or more of the following approved licenses: Apach= e, >>>>>>> BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essential= ly >>>>>>> the same terms. >>>>>>> >>>>>>> Some of this is already going on, but it is hard to get a sense of >>>>>>> who >>>>>>> is doing what and how much progress we have made. =C2=A0I wonder if= we can >>>>>>> agree to a more systematic approach? =C2=A0This will make it easier= to see >>>>>>> the progress we're making and it will also make it easier for other= s >>>>>>> to help. >>>>>>> >>>>>>> Suggestions: >>>>>>> >>>>>>> 1) We need to get all files needed for the build into SVN. =C2=A0Ri= ght now >>>>>>> there are some that are copied down from the OpenOffice.org website >>>>>>> during the build's bootstrap process. =C2=A0 Until we get the files= all in >>>>>>> one place it is hard to get a comprehensive view of our dependencie= s. >>>>>>> >>>>>> >>>>>> do you mean to check in the files under ext_source into svn and remo= ve >>>>>> it >>>>>> later on when we have cleaned up the code. Or do you mean to put it >>>>>> somehwere on apache extras? >>>>>> I would prefer to save these binary files under apache extra if >>>>>> possible. >>>>>> >>>>> >>>>> >>>>> Why not just keep in in SVN? =C2=A0 Moving things to Apache-Extras do= es not >>>>> help us with the IP review. =C2=A0 In other words, if we have a depen= dency >>>>> on a OSS module that has an incompatible license, then moving that >>>>> module to Apache Extras does not make that dependency go away. =C2=A0= We >>>>> still need to understand the nature of the dependency: a build tool, = a >>>>> dynamic runtime dependency, a statically linked library, an optional >>>>> extensions, a necessary core module. >>>>> >>>>> If we find out, for example, that something in ext-sources is only >>>>> used as a build tool, and is not part of the release, then there is >>>>> nothing that prevents us from hosting it in SVN. =C2=A0 But if someth= ing is >>>>> a necessary library and it is under GPL, then this is a problem even >>>>> if we store it on Apache-Extras, >>>>> >>>>> >>>>>> >>>>>>> >>>>>>> 2) Continue the CWS integrations. =C2=A0Along with 1) this ensures = that >>>>>>> all >>>>>>> the code we need for the release is in SVN. >>>>>>> >>>>>>> 3) =C2=A0Files that Oracle include in their SGA need to have the Ap= ache >>>>>>> license header inserted and the Sun/Oracle copyright migrated to th= e >>>>>>> NOTICE file. =C2=A0Apache RAT (Release Audit Tool) [2] can be used = to >>>>>>> automate parts of this. >>>>>>> >>>>>>> 4) Once the SGA files have the Apache headers, then we can make >>>>>>> regular use of RAT to report on files that are lacking an Apache >>>>>>> header. =C2=A0Such files might be in one of the following categorie= s: >>>>>>> >>>>>>> a) Files that Oracle owns the copyright on and which should be >>>>>>> included in an amended SGA >>>>>>> >>>>>>> b) Files that have a compatible OSS license which we are permitted = to >>>>>>> use. =C2=A0This might require that we add a mention of it to the NO= TICE >>>>>>> file. >>>>>>> >>>>>>> c) Files that have an incompatible OSS license. =C2=A0These need to= be >>>>>>> removed/replaced. >>>>>>> >>>>>>> d) Files that have an OSS license that has not yet been >>>>>>> reviewed/categorized by Apache legal affairs. =C2=A0In that case we= need >>>>>>> to >>>>>>> bring it to their attention. >>>>>>> >>>>>>> e) (Hypothetically) files that are not under an OSS license at all. >>>>>>> E.g., a Microsoft header file. =C2=A0These must be removed. >>>>>>> >>>>>>> 5) We should to track the resolution of each file, and do this >>>>>>> publicly. =C2=A0The audit trail is important. =C2=A0Some ways we co= uld do this >>>>>>> might be: >>>>>>> >>>>>>> a) Track this in SVN properties. =C2=A0So set ip:sga for the SGA fi= les, >>>>>>> ip:mit for files that are MIT licensed, etc. =C2=A0This should be >>>>>>> reflected >>>>>>> in headers as well, but this is not always possible. =C2=A0For exam= ple, we >>>>>>> might have binary files where we cannot add headers, or cases where >>>>>>> the OSS files do not have headers, but where we can prove their >>>>>>> provenance via other means. >>>>>>> >>>>>>> b) Track this is a spreadsheet, one row per file. >>>>>>> >>>>>>> c) Track this is an text log file checked in SVN >>>>>>> >>>>>>> d) Track this in an annotated script that runs RAT, where the >>>>>>> annotations document the reason for cases where we tell it to ignor= e >>>>>>> a >>>>>>> file or directory. >>>>>>> >>>>>>> 6) Iterate until we have a clean RAT report. >>>>>>> >>>>>>> 7) Goal should be for anyone today to be able to see what work >>>>>>> remains >>>>>>> for IP clearance, as well as for someone 5 years from now to be abl= e >>>>>>> to tell what we did. =C2=A0Tracking this on the community wiki is p= robably >>>>>>> not good enough, since we've previously talked about dropping that >>>>>>> wiki and going to MWiki. >>>>>>> >>>>>> >>>>>> talked about it yes but did we reached a final decision? >>>>>> >>>>>> The migrated wiki is available under http://ooo-wiki.apache.org/wiki >>>>>> and >>>>>> can >>>>>> be used. Do we want to continue with this wiki now? It's still not >>>>>> clear >>>>>> for >>>>>> me at the moment. >>>>>> >>>>>> But we need a place to document the IP clearance and under >>>>>> http://ooo-wiki.apache.org/wiki/ApacheMigration we have already some >>>>>> information. >>>>>> >>>>> >>>>> This is not really sufficient. The wiki is talking about module-level >>>>> dependencies. =C2=A0 This is a good star and useful for the high leve= l >>>>> discussion. But we need to look file-by-file. =C2=A0We need to catch = the >>>>> case where (hypothetically) there is a single GPL header file sitting >>>>> in a core OOo source directory. =C2=A0So we need to review 100,000's = of >>>>> files. =C2=A0Too big for a table on the wiki. >>>> >>>> If you think in files than yes, it's too big. >>>> >>>> But when you split this up into the application modules and submodules >>>> and >>>> sub-sub-modules, then different people can work in parallel when it's >>>> known >>>> who is working in what module. >>>> >>> >>> We don't really have a comprehensive view of the licenses in the >>> source tree until we do a file-by-file scan. =C2=A0Until we do that we = just >>> have an approximation. >>> >>> But once we have a detailed view, then it is natural to work on the >>> larger chunks module-by-module. =C2=A0Most files we need to worry about >>> will be in a module where we will treat all files in that module the >>> same way. =C2=A0But until proven otherwise, we need to be alert to the >>> possibility that there is a single non-OSS Microsoft header file >>> sitting in a directory someplace. =C2=A0I'm not saying this has actuall= y >>> happened, or that it is likely to have happened. =C2=A0I'm just saying = that >>> our review needs to be detailed enough that we can catch such a >>> problem if it occurs. >>> >>> >>>> IMHO this should work and there is always an actual and current >>>> overview. >>>> >>>> Marcus >>>> >>>> >>>> >>>>> Note also that doing this kind of check is a per-requisite for every >>>>> release we do at Apache. =C2=A0So agreeing on what tools and techniqu= es we >>>>> want to use for this process is important. =C2=A0If we do it right, t= he >>>>> next time we do a review it will be very fast and easy, since we'll b= e >>>>> able to build upon the review we've already done. That's why I think >>>>> that either using svn properties or scripts with annotated data files >>>>> listing "cleared" files is the best approach. =C2=A0Make the review p= rocess >>>>> be data-driven and reproducible using automated tools. =C2=A0It won't >>>>> totally eliminate the need for manual inspection, but it will: 1) Hel= p >>>>> parallelize that effort, and 2) Ensure it is only done once per file. >>>>> >>>>>> Juergen >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> -Rob >>>>>>> >>>>>>> >>>>>>> [1] http://incubator.apache.org/projects/openofficeorg.html >>>>>>> >>>>>>> [2] http://incubator.apache.org/rat/ >