www-legal-discuss mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LEGAL-309) Apache OpenNLP wants to release models trained on Universal Dependency under AL 2.0
Date Fri, 07 Jul 2017 16:20:00 GMT

    [ https://issues.apache.org/jira/browse/LEGAL-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078321#comment-16078321
] 

Chris A. Mattmann commented on LEGAL-309:
-----------------------------------------

In the end it's about accepted risk in this case. I think the case of accepted risk is very
small or negligible in the case of UD, but I think higher in the case of LDC, especially since
Joern reached out to them specifically with a non-hypothetical and they were not easily accepting
of it.

I read LEGAL-317 and will move this part of the discussion there. Based on the above, I am
unlikely to change the previous decision related to LDC.

> Apache OpenNLP wants to release models trained on Universal Dependency under AL 2.0
> -----------------------------------------------------------------------------------
>
>                 Key: LEGAL-309
>                 URL: https://issues.apache.org/jira/browse/LEGAL-309
>             Project: Legal Discuss
>          Issue Type: Question
>            Reporter: Joern Kottmann
>            Assignee: Chris A. Mattmann
>
> The OpenNLP project develops statistical natural language processing software which needs
to be trained in order to produce a model that can be used to perform one of our supported
tasks such as part-of-speech tagging or lemmatization.
> We would like to know if it would be possible to train models on data included in UD
which itself is licensed under various licenses and then release the trained models under
AL 2.0.
> If you go to [1] you can see a list of data files and their license.
> Here is a list of the licenses:
> CC BY 4.0
> CC BY SA 4.0
> CC BY-NC-SA 2.5, 3.0, 4.0 and without version
> CC BY-NC-SA US 3.0
> CC BY-SA 4.0 
> GPL
> LGPLLR
> The models we would like to train on that data are:
> - Part-of-Speech models (contains bigrams and a set of individual words of the training
text)
> - Lemmatizer (contains a set of individual words of the training text)
> As far as we understand individual words or very short phrases extracted from a corpus
are not protected by its original copyright. The above licenses as far as we know don't forbid
to derive statistics from its content. 
> [1] http://universaldependencies.org/
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org


Mime
View raw message