incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: [DISCUSS] What should we do with cTAKES resources?
Date Wed, 21 Nov 2012 22:01:34 GMT
The first pass at separating the umls resources from ASF is ready...  
Basically, developers can just pick and choose the ctakes resources by artifictid now.  

Details: The below steps had to be done:
1) UMLS resource project(s) are left behind no sourceforge under new projects:
http://svn.code.sf.net/p/ctakesresources/code/trunk/
[New account and space created for net.sourceforge.ctakesresources]

2) New modules deployed to oss sonatype and maven central:
https://oss.sonatype.org/index.html#nexus-search;quick~ctakesresources
[New account and space created for net.sourceforge.ctakesresources]

3) The appropriate ctakes modules a.k.a ctakes-dictionary-lookup/pom.xml now just needs to
include:
			<dependency>
				<groupId>net.sourceforge.ctakesresources</groupId>
				<artifactId>ctakes-resources-umls2011ab</artifactId>
				<version>3.0.0</version>
			</dependency>
4) Finally to make it transparent for developers, added the maven-dependency-plugin:unpack-dependencies
to unzip them into target.  This is because things like Lucene need them to be unpacked files
rather than within a jar.
4a) End users could just download the resources zip file from https://sourceforge.net/projects/ctakesresources/files/
and add it to their resources folder and provide their umls username/pw during execution.

Note: Only the umls resources have been separated now due to the ASF licensing incompatibilities,
but other projects should be able to do the same using this mechanism.

--Pei

> -----Original Message-----
> From: Jörn Kottmann [mailto:kottmann@gmail.com]
> Sent: Monday, November 05, 2012 7:42 AM
> To: ctakes-dev@incubator.apache.org
> Subject: Re: [DISCUSS] What should we do with cTAKES resources?
> 
> In my opinion we should release what we can from here at Apache and only
> the resources which have an incompatible license need to be handled
> differently, e.g. external site.
> 
> Models which are trained on private clinical data can be released as long as
> the original creator decides to license them under AL 2.0. If that is done by a
> committer it should be fine to just check them in or put them on the website.
> 
> The wikipedia license is compatible and an index of it as well, but we
> probably need to have attributio for it in a NOTICE file, and maybe include
> the license in the LICENSE file.
> 
> Jörn
> 
> On 11/02/2012 10:46 PM, Chen, Pei wrote:
> > I think we postponed this topic previously and since the ASF code seems to
> be in decent shape now, I think it's time to revisit this discussion for the
> longer term.
> > Currently, we have the below resources bundled with our source code
> > and distribution
> >
> > -          UMLS dictionaries (hsqldb format and in lucene indexes)
> >
> > -          Models (which were okay be to release opened source) that have
> been train from various clinical data
> >
> > -          Wikipedia index
> >
> > What are our options as ASF source code, binaries, models,
> > dependencies all need to be compliant with ASL 2.0
> > (http://www.apache.org/legal/3party.html)
> >
> > 1)      Leave things as they are, but we need to confirm with the sources and
> also will probably need to seek approval from Apache Legal for each of the
> resources
> >
> > 2)      Host the resources externally such as SourceForge similar to OpenNLP
> models (http://opennlp.sourceforge.net/models-1.5/)
> >
> > a.       Single zip per release for users to download?
> >
> > Option 2 seems the least painful in terms of compliance.
> > Since 3.0.0-incubating, each resource has a fully qualified name/path and is
> read from the classpath so it should be fairly easy if we decided to pull it in
> from external sources.
> >
> > --Pei
> >
> >


Mime
View raw message