incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <>
Subject [DISCUSS] What should we do with cTAKES resources?
Date Fri, 02 Nov 2012 21:46:34 GMT
I think we postponed this topic previously and since the ASF code seems to be in decent shape
now, I think it's time to revisit this discussion for the longer term.
Currently, we have the below resources bundled with our source code and distribution

-          UMLS dictionaries (hsqldb format and in lucene indexes)

-          Models (which were okay be to release opened source) that have been train from
various clinical data

-          Wikipedia index

What are our options as ASF source code, binaries, models, dependencies all need to be compliant
with ASL 2.0 (

1)      Leave things as they are, but we need to confirm with the sources and also will probably
need to seek approval from Apache Legal for each of the resources

2)      Host the resources externally such as SourceForge similar to OpenNLP models (

a.       Single zip per release for users to download?

Option 2 seems the least painful in terms of compliance.
Since 3.0.0-incubating, each resource has a fully qualified name/path and is read from the
classpath so it should be fairly easy if we decided to pull it in from external sources.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message