incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <>
Subject RE: [DISCUSS] What should we do with cTAKES resources?
Date Wed, 21 Nov 2012 22:01:34 GMT
The first pass at separating the umls resources from ASF is ready...  
Basically, developers can just pick and choose the ctakes resources by artifictid now.  

Details: The below steps had to be done:
1) UMLS resource project(s) are left behind no sourceforge under new projects:
[New account and space created for net.sourceforge.ctakesresources]

2) New modules deployed to oss sonatype and maven central:;quick~ctakesresources
[New account and space created for net.sourceforge.ctakesresources]

3) The appropriate ctakes modules a.k.a ctakes-dictionary-lookup/pom.xml now just needs to
4) Finally to make it transparent for developers, added the maven-dependency-plugin:unpack-dependencies
to unzip them into target.  This is because things like Lucene need them to be unpacked files
rather than within a jar.
4a) End users could just download the resources zip file from
and add it to their resources folder and provide their umls username/pw during execution.

Note: Only the umls resources have been separated now due to the ASF licensing incompatibilities,
but other projects should be able to do the same using this mechanism.


> -----Original Message-----
> From: Jörn Kottmann []
> Sent: Monday, November 05, 2012 7:42 AM
> To:
> Subject: Re: [DISCUSS] What should we do with cTAKES resources?
> In my opinion we should release what we can from here at Apache and only
> the resources which have an incompatible license need to be handled
> differently, e.g. external site.
> Models which are trained on private clinical data can be released as long as
> the original creator decides to license them under AL 2.0. If that is done by a
> committer it should be fine to just check them in or put them on the website.
> The wikipedia license is compatible and an index of it as well, but we
> probably need to have attributio for it in a NOTICE file, and maybe include
> the license in the LICENSE file.
> Jörn
> On 11/02/2012 10:46 PM, Chen, Pei wrote:
> > I think we postponed this topic previously and since the ASF code seems to
> be in decent shape now, I think it's time to revisit this discussion for the
> longer term.
> > Currently, we have the below resources bundled with our source code
> > and distribution
> >
> > -          UMLS dictionaries (hsqldb format and in lucene indexes)
> >
> > -          Models (which were okay be to release opened source) that have
> been train from various clinical data
> >
> > -          Wikipedia index
> >
> > What are our options as ASF source code, binaries, models,
> > dependencies all need to be compliant with ASL 2.0
> > (
> >
> > 1)      Leave things as they are, but we need to confirm with the sources and
> also will probably need to seek approval from Apache Legal for each of the
> resources
> >
> > 2)      Host the resources externally such as SourceForge similar to OpenNLP
> models (
> >
> > a.       Single zip per release for users to download?
> >
> > Option 2 seems the least painful in terms of compliance.
> > Since 3.0.0-incubating, each resource has a fully qualified name/path and is
> read from the classpath so it should be fairly easy if we decided to pull it in
> from external sources.
> >
> > --Pei
> >
> >

View raw message