incubator-ooo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus (OOo)" <marcus.m...@wtnet.de>
Subject Re: A systematic approach to IP review?
Date Mon, 19 Sep 2011 17:26:48 GMT
Am 09/19/2011 07:05 PM, schrieb Rob Weir:
> On Mon, Sep 19, 2011 at 12:43 PM, Marcus (OOo)<marcus.mail@wtnet.de>  wrote:
>> Am 09/19/2011 04:47 PM, schrieb Rob Weir:
>>>
>>> On Mon, Sep 19, 2011 at 8:13 AM, Marcus (OOo)<marcus.mail@wtnet.de>
>>>   wrote:
>>>>
>>>> Am 09/19/2011 01:59 PM, schrieb Rob Weir:
>>>>>
>>>>> 2011/9/19 J├╝rgen Schmidt<jogischmidt@googlemail.com>:
>>>>>>
>>>>>> On Mon, Sep 19, 2011 at 2:27 AM, Rob Weir<robweir@apache.org>
     wrote:
>>>>>>
>>>>>>> If you haven't looked it closely, it is probably worth a few
minutes
>>>>>>> of your time to review our incubation status page, especially
the
>>>>>>> items under "Copyright" and "Verify Distribution Rights".  It
lists
>>>>>>> the things we need to do, including:
>>>>>>>
>>>>>>>   -- Check and make sure that the papers that transfer rights
to the
>>>>>>> ASF been received. It is only necessary to transfer rights for
the
>>>>>>> package, the core code, and any new code produced by the project.
>>>>>>>
>>>>>>> -- Check and make sure that the files that have been donated
have been
>>>>>>> updated to reflect the new ASF copyright.
>>>>>>>
>>>>>>> -- Check and make sure that for all code included with the
>>>>>>> distribution that is not under the Apache license, we have the
right
>>>>>>> to combine with Apache-licensed code and redistribute.
>>>>>>>
>>>>>>> -- Check and make sure that all source code distributed by the
project
>>>>>>> is covered by one or more of the following approved licenses:
Apache,
>>>>>>> BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially
>>>>>>> the same terms.
>>>>>>>
>>>>>>> Some of this is already going on, but it is hard to get a sense
of who
>>>>>>> is doing what and how much progress we have made.  I wonder if
we can
>>>>>>> agree to a more systematic approach?  This will make it easier
to see
>>>>>>> the progress we're making and it will also make it easier for
others
>>>>>>> to help.
>>>>>>>
>>>>>>> Suggestions:
>>>>>>>
>>>>>>> 1) We need to get all files needed for the build into SVN.  Right
now
>>>>>>> there are some that are copied down from the OpenOffice.org website
>>>>>>> during the build's bootstrap process.   Until we get the files
all in
>>>>>>> one place it is hard to get a comprehensive view of our dependencies.
>>>>>>>
>>>>>>
>>>>>> do you mean to check in the files under ext_source into svn and remove
>>>>>> it
>>>>>> later on when we have cleaned up the code. Or do you mean to put
it
>>>>>> somehwere on apache extras?
>>>>>> I would prefer to save these binary files under apache extra if
>>>>>> possible.
>>>>>>
>>>>>
>>>>>
>>>>> Why not just keep in in SVN?   Moving things to Apache-Extras does not
>>>>> help us with the IP review.   In other words, if we have a dependency
>>>>> on a OSS module that has an incompatible license, then moving that
>>>>> module to Apache Extras does not make that dependency go away.  We
>>>>> still need to understand the nature of the dependency: a build tool,
a
>>>>> dynamic runtime dependency, a statically linked library, an optional
>>>>> extensions, a necessary core module.
>>>>>
>>>>> If we find out, for example, that something in ext-sources is only
>>>>> used as a build tool, and is not part of the release, then there is
>>>>> nothing that prevents us from hosting it in SVN.   But if something is
>>>>> a necessary library and it is under GPL, then this is a problem even
>>>>> if we store it on Apache-Extras,
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> 2) Continue the CWS integrations.  Along with 1) this ensures
that all
>>>>>>> the code we need for the release is in SVN.
>>>>>>>
>>>>>>> 3)  Files that Oracle include in their SGA need to have the Apache
>>>>>>> license header inserted and the Sun/Oracle copyright migrated
to the
>>>>>>> NOTICE file.  Apache RAT (Release Audit Tool) [2] can be used
to
>>>>>>> automate parts of this.
>>>>>>>
>>>>>>> 4) Once the SGA files have the Apache headers, then we can make
>>>>>>> regular use of RAT to report on files that are lacking an Apache
>>>>>>> header.  Such files might be in one of the following categories:
>>>>>>>
>>>>>>> a) Files that Oracle owns the copyright on and which should be
>>>>>>> included in an amended SGA
>>>>>>>
>>>>>>> b) Files that have a compatible OSS license which we are permitted
to
>>>>>>> use.  This might require that we add a mention of it to the NOTICE
>>>>>>> file.
>>>>>>>
>>>>>>> c) Files that have an incompatible OSS license.  These need to
be
>>>>>>> removed/replaced.
>>>>>>>
>>>>>>> d) Files that have an OSS license that has not yet been
>>>>>>> reviewed/categorized by Apache legal affairs.  In that case we
need to
>>>>>>> bring it to their attention.
>>>>>>>
>>>>>>> e) (Hypothetically) files that are not under an OSS license at
all.
>>>>>>> E.g., a Microsoft header file.  These must be removed.
>>>>>>>
>>>>>>> 5) We should to track the resolution of each file, and do this
>>>>>>> publicly.  The audit trail is important.  Some ways we could
do this
>>>>>>> might be:
>>>>>>>
>>>>>>> a) Track this in SVN properties.  So set ip:sga for the SGA files,
>>>>>>> ip:mit for files that are MIT licensed, etc.  This should be
reflected
>>>>>>> in headers as well, but this is not always possible.  For example,
we
>>>>>>> might have binary files where we cannot add headers, or cases
where
>>>>>>> the OSS files do not have headers, but where we can prove their
>>>>>>> provenance via other means.
>>>>>>>
>>>>>>> b) Track this is a spreadsheet, one row per file.
>>>>>>>
>>>>>>> c) Track this is an text log file checked in SVN
>>>>>>>
>>>>>>> d) Track this in an annotated script that runs RAT, where the
>>>>>>> annotations document the reason for cases where we tell it to
ignore a
>>>>>>> file or directory.
>>>>>>>
>>>>>>> 6) Iterate until we have a clean RAT report.
>>>>>>>
>>>>>>> 7) Goal should be for anyone today to be able to see what work
remains
>>>>>>> for IP clearance, as well as for someone 5 years from now to
be able
>>>>>>> to tell what we did.  Tracking this on the community wiki is
probably
>>>>>>> not good enough, since we've previously talked about dropping
that
>>>>>>> wiki and going to MWiki.
>>>>>>>
>>>>>>
>>>>>> talked about it yes but did we reached a final decision?
>>>>>>
>>>>>> The migrated wiki is available under http://ooo-wiki.apache.org/wiki
>>>>>> and
>>>>>> can
>>>>>> be used. Do we want to continue with this wiki now? It's still not
>>>>>> clear
>>>>>> for
>>>>>> me at the moment.
>>>>>>
>>>>>> But we need a place to document the IP clearance and under
>>>>>> http://ooo-wiki.apache.org/wiki/ApacheMigration we have already some
>>>>>> information.
>>>>>>
>>>>>
>>>>> This is not really sufficient. The wiki is talking about module-level
>>>>> dependencies.   This is a good star and useful for the high level
>>>>> discussion. But we need to look file-by-file.  We need to catch the
>>>>> case where (hypothetically) there is a single GPL header file sitting
>>>>> in a core OOo source directory.  So we need to review 100,000's of
>>>>> files.  Too big for a table on the wiki.
>>>>
>>>> If you think in files than yes, it's too big.
>>>>
>>>> But when you split this up into the application modules and submodules
>>>> and
>>>> sub-sub-modules, then different people can work in parallel when it's
>>>> known
>>>> who is working in what module.
>>>>
>>>
>>> We don't really have a comprehensive view of the licenses in the
>>> source tree until we do a file-by-file scan.  Until we do that we just
>>> have an approximation.
>>>
>>> But once we have a detailed view, then it is natural to work on the
>>> larger chunks module-by-module.  Most files we need to worry about
>>> will be in a module where we will treat all files in that module the
>>> same way.  But until proven otherwise, we need to be alert to the
>>> possibility that there is a single non-OSS Microsoft header file
>>> sitting in a directory someplace.  I'm not saying this has actually
>>> happened, or that it is likely to have happened.  I'm just saying that
>>> our review needs to be detailed enough that we can catch such a
>>> problem if it occurs.
>>
>> I still see no problem to put this into the Wiki. Create some structure and
>> where are you actually working on. Of course it is expected that you look
>> for every file in the respective module.
>>
>> Or how do you want to keep the overview and let others know what you are
>> doing? ;-)
>>
>
> I think the wiki is fine as a collaboration tool, to list tasks and
> who is working on them.  But that is not a substitute for running
> scans with the Apache Release Audit Tool (RAT) and working toward a
> clean report.
>
> Think of it this way:
>
> 1) We have a list of modules on the wiki that we need to replace.
> Great.  Developers can work on that list.

Yes, to keep the overview and to coordinate the work for every developer.

> 2) But how do we know that the list on the wiki is complete?  How do
> we know that it is not missing anything?

How do you know that you don't commit some bugs into the source? ;-)

We don't know. The list would be created to the best of our's knowledge.

> 3) Running RAT against the source is how we ensure that the code is clean

OK, I don't know what this can do your us. Maybe it's the solution for 
the problem.

How do you know that it is not skipping anything? I guess you simply 
would trust RAT that it is doing fine, right? ;-)

BTW:
Is RAT producing a log file, so that we have a list of every file that 
was checked? This could be very helpful.

Marcus



> In other words, the criteria should be that we have a clean RAT
> record, not that we have a clean wiki.  The list of modules on the
> wiki is not traceable to a scan of the source code.  It is not
> reproducible.  It might be useful.  But it is not sufficient.
>
> -Rob
>
>> Marcus
>>
>>
>>
>>>> IMHO this should work and there is always an actual and current overview.
>>>>
>>>> Marcus
>>>>
>>>>
>>>>
>>>>> Note also that doing this kind of check is a per-requisite for every
>>>>> release we do at Apache.  So agreeing on what tools and techniques we
>>>>> want to use for this process is important.  If we do it right, the
>>>>> next time we do a review it will be very fast and easy, since we'll be
>>>>> able to build upon the review we've already done. That's why I think
>>>>> that either using svn properties or scripts with annotated data files
>>>>> listing "cleared" files is the best approach.  Make the review process
>>>>> be data-driven and reproducible using automated tools.  It won't
>>>>> totally eliminate the need for manual inspection, but it will: 1) Help
>>>>> parallelize that effort, and 2) Ensure it is only done once per file.
>>>>>
>>>>>> Juergen
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Rob
>>>>>>>
>>>>>>>
>>>>>>> [1] http://incubator.apache.org/projects/openofficeorg.html
>>>>>>>
>>>>>>> [2] http://incubator.apache.org/rat/

Mime
View raw message