ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject Re: roadmap for Apache cTakes "big data" processing
Date Mon, 29 Apr 2013 00:39:52 GMT
Has anyone tried Mahout recently?
Last time I tried, it was still closely tied to the Hadoop file system. 

Sent from my iPhone

On Apr 28, 2013, at 7:44 PM, "Andy McMurry" <mcmurry.andy@gmail.com> wrote:

> I encourage committers to checkout Apache Mahout 
> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
> 
> Why Apache Mahout? 
> 1. provides ML classifiers and functions not available through UIMA
> 2. parallel by design, transparently invokes Hadoop  
> 3. Java and Apache license (every other known toolkit is GPL!) 
> 4. likely to become standard ML package for Apache 
> 
> Why would we use mahout in cTakes? 
> cTakes models are "provided", for example PoS tagging. 
> Retraining these models on your own compute cluster would be difficult  (in my opinion).

> LibSVM is nice, but it is only one classification method. 
> 
> When ? 
> No rush, however, I suggest we dont invest time in porting SINGLE-CPU classifier functions
that we will have to parallelize, later. 
> 
> Summary: 
> UIMA + mahout = pipelines + classification 
> 
> 
> 
> 
> On Apr 28, 2013, at 4:26 PM, "Savova, Guergana" <Guergana.Savova@childrens.harvard.edu>
wrote:
> 
>> +1 
>> --guergana
>> 
>> -----Original Message-----
>> From: Kaggal, Vinod C. [mailto:Kaggal.Vinod@mayo.edu] 
>> Sent: Saturday, April 27, 2013 11:21 PM
>> To: <dev@ctakes.apache.org>
>> Cc: <dev@ctakes.apache.org>
>> Subject: Re: roadmap for Apache cTakes "big data" processing
>> 
>> +1
>> 
>> 
>> On Apr 27, 2013, at 9:05 PM, "Chen, Pei" <Pei.Chen@childrens.harvard.edu> wrote:
>> 
>>> +1 for UIMA-AS
>>> 
>>> 
>>> On Apr 27, 2013, at 9:25 PM, "Andy McMurry" <mcmurry.andy@gmail.com> wrote:
>>> 
>>>> I'm writing to gauge community interest and intent for parallel processing
with cTakes. 
>>>> 
>>>> Apache UIMA is planning "Async Scaleout" as a replacement for CPM. 
>>>> http://uima.apache.org/doc-uimaas-what.html
>>>> 
>>>> Apache Mahout is likely to become the defacto apache package for machine
learning. 
>>>> http://mahout.apache.org/
>>>> 
>>>> I believe cTakes will embrace both of these in due time.  
>>>> Do you agree or do you have a different view?
> 

Mime
View raw message