incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <>
Subject Re: Creative Commons License (was: checking in wiki)
Date Thu, 03 Jan 2013 08:43:05 GMT
On 01/03/2013 01:33 AM, Benson Margulies wrote:
> On Wed, Jan 2, 2013 at 2:30 PM, Tim Miller
> <>  wrote:
>> >The license is share alike 3.0, the reasons we need advice is because we are
>> >using modified/derived version (the clause in the legal FAQ starts
>> >"Unmodified media..."). Specifically, we built a lucene index with 5000
>> >wikipedia articles relating to medicine. Each article is modified by
>> >reducing it to list of words and their counts in that article. Is there some
>> >advice on whether this sort of modification is allowable or whether it
>> >disqualifies?
> A language model derived from a corpus is not necessarily a derived
> work of the corpus. Opinions vary. Some would tell you that it's a new
> work entirely, and you own it. Others would tell you that you need a
> specific license from the original content owners.

The answer probably also varies a lot on the legal system of the
country you are in. As far as I know things a stricter in some European
countries since they do not have a fair use clause like in the US.

Media Monitoring companies for example get away by using short
extracts (couple of words or sentences) from news articles and selling
them to their customers as their own work.

Statistical models usually contain much shorter pieces of text, often just
bi- or tri-grams and cannot be used to reconstruct longer pieces of text.


View raw message