www-legal-discuss mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henri Yandell <bay...@apache.org>
Subject Re: Training models for OpenNLP on the OntoNotes corpus
Date Mon, 06 Feb 2017 16:35:03 GMT
I don't believe this acceptable.

It's a non-commercial license that would restrict the uses of the
subsequent Apache product.

Note that the license would also need signing (i.e. it's not something we
can use off the shelf).

One approach would be to contact LDC to let them know our interest in
using, but make sure they understand that the output would be going into a
product under the Apache 2.0 license and that they understand our concern.

Hen

On Fri, Feb 3, 2017 at 2:51 AM, Joern Kottmann <joern@apache.org> wrote:

> Hello all,
>
> the Apache OpenNLP library is a machine learning based toolkit for the
> processing of natural language text.It supports the most common NLP tasks,
> such as tokenization, sentence segmentation, part-of-speech tagging, named
> entity extraction, chunking and parsing.
>
> Many of the competing solutions offer pre-trained models on various data
> sources to their users. We came to the conclusion that we have to do the
> same to stay relevant.
>
> These corpora we would like to train on usually are copyright protected or
> have a license which restrict the use.
>
> I would like to know what the opinion here on legal-discuss is to train
> models based on the OntoNotes corpus [1]. Their license can be found here
> [2].
>
> The training process does the following with the corpus as input:
>
> - Generates string based features (e.g. about word shape, n-grams, various
> combinations, etc.), those features to not contain longer parts of the
> corpus text
>
> - Computes weights for those features based on the corpus
>
> The features and weights are stored together in what we call a model and
> this model we wish to distribute under AL 2.0 at Apache OpenNLP.
>
> Would it be ok to do that? Are there any concerns?
>
> Thanks,
>
> Jörn
>
>
> [1] https://catalog.ldc.upenn.edu/LDC2013T19
>
> [2] https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
>

Mime
View raw message