www-legal-discuss mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joern Kottmann <kottm...@gmail.com>
Subject Re: Training models for OpenNLP on the Universial Dependency corpus
Date Fri, 19 May 2017 11:13:09 GMT

I opened an issue for this:

We would like to get this resolved soon. The project just releases OpenNLP
1.8.0 yesterday, and now we would like to release pre-trained models as
well, but to be able to do that we would need to resolve this first.


On Wed, May 10, 2017 at 4:31 PM, Joern Kottmann <kottmann@gmail.com> wrote:

> Hello all,
> we already had this discussion for OntoNotes [1] and I would like to know
> how the case is for the Universal Dependency [2] corpus.
> The OpenNLP project develops statistical natural language processing
> software which needs to be trained in order to produce a model that can be
> used to perform one of our supported tasks such as part-of-speech tagging
> or lemmatization.
> We would like to know if it would be possible to train models on data
> included in UD which itself is licensed under various Creative Commons
> licenses (e.g. CC BY-NC 3.0/4.0, CC BY-SA 4.0, CC BY 4.0), GPL and others,
> and then license the trained model under AL 2.0.
> If you go to [2] you can see a list of data files and their license.
> As far as we understand those licenses don't explicitly disallow using the
> content for training models as it is the case with the OntoNotes LDC
> license.
> The models we would like to train on that data are:
> - Part-of-Speech models (contains bigrams and a set of individual words of
> the training text)
> - Lemmatizer (contains a set of individual words of the training text)
> Jörn
> [1] http://mail-archives.apache.org/mod_mbox/www-legal-
> discuss/201702.mbox/%3CCA%2BV%3DWqhEsBWDb%2BQ%2BaEkjfO_
> FmGoPx2yGiw2oHYjQrWpaUGmoNw%40mail.gmail.com%3E
> [2] http://universaldependencies.org/

View raw message