ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy McMurry <mcmurry.a...@gmail.com>
Subject Re: roadmap for Apache cTakes "big data" processing
Date Sun, 28 Apr 2013 23:43:53 GMT
I encourage committers to checkout Apache Mahout 
https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms

Why Apache Mahout? 
1. provides ML classifiers and functions not available through UIMA
2. parallel by design, transparently invokes Hadoop  
3. Java and Apache license (every other known toolkit is GPL!) 
4. likely to become standard ML package for Apache 

Why would we use mahout in cTakes? 
cTakes models are "provided", for example PoS tagging. 
Retraining these models on your own compute cluster would be difficult  (in my opinion). 
LibSVM is nice, but it is only one classification method. 

When ? 
No rush, however, I suggest we dont invest time in porting SINGLE-CPU classifier functions
that we will have to parallelize, later. 

Summary: 
UIMA + mahout = pipelines + classification 




On Apr 28, 2013, at 4:26 PM, "Savova, Guergana" <Guergana.Savova@childrens.harvard.edu>
wrote:

> +1 
> --guergana
> 
> -----Original Message-----
> From: Kaggal, Vinod C. [mailto:Kaggal.Vinod@mayo.edu] 
> Sent: Saturday, April 27, 2013 11:21 PM
> To: <dev@ctakes.apache.org>
> Cc: <dev@ctakes.apache.org>
> Subject: Re: roadmap for Apache cTakes "big data" processing
> 
> +1
> 
> 
> On Apr 27, 2013, at 9:05 PM, "Chen, Pei" <Pei.Chen@childrens.harvard.edu> wrote:
> 
>> +1 for UIMA-AS
>> 
>> 
>> On Apr 27, 2013, at 9:25 PM, "Andy McMurry" <mcmurry.andy@gmail.com> wrote:
>> 
>>> I'm writing to gauge community interest and intent for parallel processing with
cTakes. 
>>> 
>>> Apache UIMA is planning "Async Scaleout" as a replacement for CPM. 
>>> http://uima.apache.org/doc-uimaas-what.html
>>> 
>>> Apache Mahout is likely to become the defacto apache package for machine learning.

>>> http://mahout.apache.org/
>>> 
>>> I believe cTakes will embrace both of these in due time.  
>>> Do you agree or do you have a different view? 
>>> 
>>> 
>>> 
>>> 
>>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message