ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: roadmap for Apache cTakes "big data" processing
Date Mon, 29 Apr 2013 07:58:15 GMT
On 04/29/2013 01:43 AM, Andy McMurry wrote:
> I encourage committers to checkout Apache Mahout
> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
>
> Why Apache Mahout?
> 1. provides ML classifiers and functions not available through UIMA
> 2. parallel by design, transparently invokes Hadoop
> 3. Java and Apache license (every other known toolkit is GPL!)
> 4. likely to become standard ML package for Apache
>
> Why would we use mahout in cTakes?
> cTakes models are "provided", for example PoS tagging.
> Retraining these models on your own compute cluster would be difficult  (in my opinion).
> LibSVM is nice, but it is only one classification method.
>

The Mahout classifiers will probably soon be integrated into OpenNLP, 
here is the jira issue.
https://issues.apache.org/jira/browse/OPENNLP-574

The idea is to make the ML part in OpenNLP plugable, so that all kind of 
classification libraries can be supported.

Also interesting might be Mahouts Clustering and LDA capability, which 
can probably be performed on the
entire document database.

Jörn

Mime
View raw message