incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Miller <>
Subject Re: incorporating wikipedia features
Date Thu, 27 Sep 2012 21:32:36 GMT
That's helpful, but does it seem overly restrictive?  The license itself 
allows release of derived work as long as it has the same license.

And does this fall into that category?  What if you built a model of the 
probability of every word in english using wikipedia as a corpus and 
wanted to include that?  Would that fall under this heading?


On 09/27/2012 04:11 PM, Masanz, James J. wrote:
> I found this on
> "Can Apache projects include Creative Commons Attribution-Share Alike works?
> Unmodified media under the Creative Commons Attribution-Share Alike 2.5 and Creative
Commons Attribution-Share Alike 3.0 licenses may be included in Apache products, subject to
the licenses attribution clauses which may require LICENSE/NOTICE/README changes. For any
other type of CC-SA licensed work, please contact the Legal PMC."
> -- James
>> -----Original Message-----
>> From:
>> []
>> On Behalf Of Tim Miller
>> Sent: Thursday, September 27, 2012 3:07 PM
>> To:
>> Subject: incorporating wikipedia features
>> Hi team,
>> I have built a small lucene index from the wikipedia dump that helps with
>> calculating a feature for the coreference module.  It gives a big
>> improvement in performance and it is likely that there are more features
>> that can be incorporated from this resource.
>> My question is about how to go about including this resource.  The
>> Copyrights page says the text is available under the Creative Commons
>> Attribution-ShareAlike 3.0 License which is very permissive.  But I'm
>> wondering if anyone has any experience with this.  Specifically, the
>> resource is a lucene index of 5000 wikipedia articles, where each indexed
>> document is a wiki entry with the title and slightly modified full text
>> (wiki syntax stripped and foreign characters removed).  Any knowledge on
>> this subject would be appreciated.
>> Thanks,
>> --
>> Tim Miller, PhD
>> Postdoctoral Research Fellow
>> Children's Hospital Informatics Program
>> Boston Children's Hospital and Harvard Medical School
>> 617-919-1223

View raw message