www-legal-discuss mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: Training models for OpenNLP on the OntoNotes corpus
Date Mon, 20 Feb 2017 18:26:37 GMT
Hi Jörn,

thanks for the result! It would also be great if you could let us know when you find a new
suitable resource :)

One that might be suitable is the The Georgetown University Multilayer Corpus [1], at least
the parts from Wikinews and Wikivoyage which are licensed CC-BY.

Cheers,

-- Richard

[1] https://corpling.uis.georgetown.edu/gum/#license

> On 17.02.2017, at 11:24, Joern Kottmann <kottmann@gmail.com> wrote:
> 
> Hello all,
> 
> they replied to me and said the main issue is that their data (or models trained on it)
cannot be licensed under any agreements other than their own. So this is the case for their
research-only and commercial license. 
> 
> Therefore training on LDC data (even if a member with the commercial license would do
it) and releasing the model under AL 2.0 (or any other Open Source license) is not allowed.
> On the other hand they seem to tolerate that Open Source projects are doing that, when
you google for models trained on their data you can find many examples.
> 
> We will have to look for new sources of data to train our models on.
> 
> Thanks to everyone for helping with this issue.
> 
> Jörn


---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org


Mime
View raw message