openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Weir <>
Subject Re: A systematic approach to IP review?
Date Mon, 19 Sep 2011 17:05:27 GMT
On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo) <> wrote:
> Am 09/19/2011 04:47 PM, schrieb Rob Weir:
>> On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)<>
>>  wrote:
>>> Am 09/19/2011 01:59 PM, schrieb Rob Weir:
>>>> 2011/9/19 Jürgen Schmidt<>:
>>>>> On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir<>  
>>>>>> If you haven't looked it closely, it is probably worth a few minutes
>>>>>> of your time to review our incubation status page, especially the
>>>>>> items under "Copyright" and "Verify Distribution Rights".  It lists
>>>>>> the things we need to do, including:
>>>>>>  -- Check and make sure that the papers that transfer rights to
>>>>>> ASF been received. It is only necessary to transfer rights for the
>>>>>> package, the core code, and any new code produced by the project.
>>>>>> -- Check and make sure that the files that have been donated have
>>>>>> updated to reflect the new ASF copyright.
>>>>>> -- Check and make sure that for all code included with the
>>>>>> distribution that is not under the Apache license, we have the right
>>>>>> to combine with Apache-licensed code and redistribute.
>>>>>> -- Check and make sure that all source code distributed by the project
>>>>>> is covered by one or more of the following approved licenses: Apache,
>>>>>> BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
>>>>>> the same terms.
>>>>>> Some of this is already going on, but it is hard to get a sense of
>>>>>> is doing what and how much progress we have made.  I wonder if we
>>>>>> agree to a more systematic approach?  This will make it easier to
>>>>>> the progress we're making and it will also make it easier for others
>>>>>> to help.
>>>>>> Suggestions:
>>>>>> 1) We need to get all files needed for the build into SVN.  Right
>>>>>> there are some that are copied down from the website
>>>>>> during the build's bootstrap process.   Until we get the files all
>>>>>> one place it is hard to get a comprehensive view of our dependencies.
>>>>> do you mean to check in the files under ext_source into svn and remove
>>>>> it
>>>>> later on when we have cleaned up the code. Or do you mean to put it
>>>>> somehwere on apache extras?
>>>>> I would prefer to save these binary files under apache extra if
>>>>> possible.
>>>> Why not just keep in in SVN?   Moving things to Apache-Extras does not
>>>> help us with the IP review.   In other words, if we have a dependency
>>>> on a OSS module that has an incompatible license, then moving that
>>>> module to Apache Extras does not make that dependency go away.  We
>>>> still need to understand the nature of the dependency: a build tool, a
>>>> dynamic runtime dependency, a statically linked library, an optional
>>>> extensions, a necessary core module.
>>>> If we find out, for example, that something in ext-sources is only
>>>> used as a build tool, and is not part of the release, then there is
>>>> nothing that prevents us from hosting it in SVN.   But if something is
>>>> a necessary library and it is under GPL, then this is a problem even
>>>> if we store it on Apache-Extras,
>>>>>> 2) Continue the CWS integrations.  Along with 1) this ensures that
>>>>>> the code we need for the release is in SVN.
>>>>>> 3)  Files that Oracle include in their SGA need to have the Apache
>>>>>> license header inserted and the Sun/Oracle copyright migrated to
>>>>>> NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used to
>>>>>> automate parts of this.
>>>>>> 4) Once the SGA files have the Apache headers, then we can make
>>>>>> regular use of RAT to report on files that are lacking an Apache
>>>>>> header.  Such files might be in one of the following categories:
>>>>>> a) Files that Oracle owns the copyright on and which should be
>>>>>> included in an amended SGA
>>>>>> b) Files that have a compatible OSS license which we are permitted
>>>>>> use.  This might require that we add a mention of it to the NOTICE
>>>>>> file.
>>>>>> c) Files that have an incompatible OSS license.  These need to be
>>>>>> removed/replaced.
>>>>>> d) Files that have an OSS license that has not yet been
>>>>>> reviewed/categorized by Apache legal affairs.  In that case we need
>>>>>> bring it to their attention.
>>>>>> e) (Hypothetically) files that are not under an OSS license at all.
>>>>>> E.g., a Microsoft header file.  These must be removed.
>>>>>> 5) We should to track the resolution of each file, and do this
>>>>>> publicly.  The audit trail is important.  Some ways we could do
>>>>>> might be:
>>>>>> a) Track this in SVN properties.  So set ip:sga for the SGA files,
>>>>>> ip:mit for files that are MIT licensed, etc.  This should be reflected
>>>>>> in headers as well, but this is not always possible.  For example,
>>>>>> might have binary files where we cannot add headers, or cases where
>>>>>> the OSS files do not have headers, but where we can prove their
>>>>>> provenance via other means.
>>>>>> b) Track this is a spreadsheet, one row per file.
>>>>>> c) Track this is an text log file checked in SVN
>>>>>> d) Track this in an annotated script that runs RAT, where the
>>>>>> annotations document the reason for cases where we tell it to ignore
>>>>>> file or directory.
>>>>>> 6) Iterate until we have a clean RAT report.
>>>>>> 7) Goal should be for anyone today to be able to see what work remains
>>>>>> for IP clearance, as well as for someone 5 years from now to be able
>>>>>> to tell what we did.  Tracking this on the community wiki is probably
>>>>>> not good enough, since we've previously talked about dropping that
>>>>>> wiki and going to MWiki.
>>>>> talked about it yes but did we reached a final decision?
>>>>> The migrated wiki is available under
>>>>> and
>>>>> can
>>>>> be used. Do we want to continue with this wiki now? It's still not
>>>>> clear
>>>>> for
>>>>> me at the moment.
>>>>> But we need a place to document the IP clearance and under
>>>>> we have already some
>>>>> information.
>>>> This is not really sufficient. The wiki is talking about module-level
>>>> dependencies.   This is a good star and useful for the high level
>>>> discussion. But we need to look file-by-file.  We need to catch the
>>>> case where (hypothetically) there is a single GPL header file sitting
>>>> in a core OOo source directory.  So we need to review 100,000's of
>>>> files.  Too big for a table on the wiki.
>>> If you think in files than yes, it's too big.
>>> But when you split this up into the application modules and submodules
>>> and
>>> sub-sub-modules, then different people can work in parallel when it's
>>> known
>>> who is working in what module.
>> We don't really have a comprehensive view of the licenses in the
>> source tree until we do a file-by-file scan.  Until we do that we just
>> have an approximation.
>> But once we have a detailed view, then it is natural to work on the
>> larger chunks module-by-module.  Most files we need to worry about
>> will be in a module where we will treat all files in that module the
>> same way.  But until proven otherwise, we need to be alert to the
>> possibility that there is a single non-OSS Microsoft header file
>> sitting in a directory someplace.  I'm not saying this has actually
>> happened, or that it is likely to have happened.  I'm just saying that
>> our review needs to be detailed enough that we can catch such a
>> problem if it occurs.
> I still see no problem to put this into the Wiki. Create some structure and
> where are you actually working on. Of course it is expected that you look
> for every file in the respective module.
> Or how do you want to keep the overview and let others know what you are
> doing? ;-)

I think the wiki is fine as a collaboration tool, to list tasks and
who is working on them.  But that is not a substitute for running
scans with the Apache Release Audit Tool (RAT) and working toward a
clean report.

Think of it this way:

1) We have a list of modules on the wiki that we need to replace.
Great.  Developers can work on that list.

2) But how do we know that the list on the wiki is complete?  How do
we know that it is not missing anything?

3) Running RAT against the source is how we ensure that the code is clean

In other words, the criteria should be that we have a clean RAT
record, not that we have a clean wiki.  The list of modules on the
wiki is not traceable to a scan of the source code.  It is not
reproducible.  It might be useful.  But it is not sufficient.


> Marcus
>>> IMHO this should work and there is always an actual and current overview.
>>> Marcus
>>>> Note also that doing this kind of check is a per-requisite for every
>>>> release we do at Apache.  So agreeing on what tools and techniques we
>>>> want to use for this process is important.  If we do it right, the
>>>> next time we do a review it will be very fast and easy, since we'll be
>>>> able to build upon the review we've already done. That's why I think
>>>> that either using svn properties or scripts with annotated data files
>>>> listing "cleared" files is the best approach.  Make the review process
>>>> be data-driven and reproducible using automated tools.  It won't
>>>> totally eliminate the need for manual inspection, but it will: 1) Help
>>>> parallelize that effort, and 2) Ensure it is only done once per file.
>>>>> Juergen
>>>>>> -Rob
>>>>>> [1]
>>>>>> [2]

View raw message