incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Miller <>
Subject incorporating wikipedia features
Date Thu, 27 Sep 2012 20:07:26 GMT
Hi team,
I have built a small lucene index from the wikipedia dump that helps 
with calculating a feature for the coreference module.  It gives a big 
improvement in performance and it is likely that there are more features 
that can be incorporated from this resource.

My question is about how to go about including this resource.  The 
Copyrights page says the text is available under the Creative Commons 
Attribution-ShareAlike 3.0 License which is very permissive.  But I'm 
wondering if anyone has any experience with this.  Specifically, the 
resource is a lucene index of 5000 wikipedia articles, where each 
indexed document is a wiki entry with the title and slightly modified 
full text (wiki syntax stripped and foreign characters removed).  Any 
knowledge on this subject would be appreciated.


Tim Miller, PhD
Postdoctoral Research Fellow
Children's Hospital Informatics Program
Boston Children's Hospital and Harvard Medical School

View raw message