ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy McMurry <mcmurry.a...@gmail.com>
Subject Re: roadmap for Apache cTakes "big data" processing
Date Mon, 29 Apr 2013 01:39:20 GMT
Good point Pei. 

We would need to do a spike (short sprint) in the future to see if Mahout would be a good
fit. 
I'm just wondering because I'm planning out how I will be using cTakes, and was wondering
how others are planning as well.


Cheers, 
--ANdy 


On Apr 28, 2013, at 5:39 PM, "Chen, Pei" <Pei.Chen@childrens.harvard.edu> wrote:

> Has anyone tried Mahout recently?
> Last time I tried, it was still closely tied to the Hadoop file system. 
> 
> Sent from my iPhone
> 
> On Apr 28, 2013, at 7:44 PM, "Andy McMurry" <mcmurry.andy@gmail.com> wrote:
> 
>> I encourage committers to checkout Apache Mahout 
>> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
>> 
>> Why Apache Mahout? 
>> 1. provides ML classifiers and functions not available through UIMA
>> 2. parallel by design, transparently invokes Hadoop  
>> 3. Java and Apache license (every other known toolkit is GPL!) 
>> 4. likely to become standard ML package for Apache 
>> 
>> Why would we use mahout in cTakes? 
>> cTakes models are "provided", for example PoS tagging. 
>> Retraining these models on your own compute cluster would be difficult  (in my opinion).

>> LibSVM is nice, but it is only one classification method. 
>> 
>> When ? 
>> No rush, however, I suggest we dont invest time in porting SINGLE-CPU classifier
functions that we will have to parallelize, later. 
>> 
>> Summary: 
>> UIMA + mahout = pipelines + classification 
>> 
>> 
>> 
>> 
>> On Apr 28, 2013, at 4:26 PM, "Savova, Guergana" <Guergana.Savova@childrens.harvard.edu>
wrote:
>> 
>>> +1 
>>> --guergana
>>> 
>>> -----Original Message-----
>>> From: Kaggal, Vinod C. [mailto:Kaggal.Vinod@mayo.edu] 
>>> Sent: Saturday, April 27, 2013 11:21 PM
>>> To: <dev@ctakes.apache.org>
>>> Cc: <dev@ctakes.apache.org>
>>> Subject: Re: roadmap for Apache cTakes "big data" processing
>>> 
>>> +1
>>> 
>>> 
>>> On Apr 27, 2013, at 9:05 PM, "Chen, Pei" <Pei.Chen@childrens.harvard.edu>
wrote:
>>> 
>>>> +1 for UIMA-AS
>>>> 
>>>> 
>>>> On Apr 27, 2013, at 9:25 PM, "Andy McMurry" <mcmurry.andy@gmail.com>
wrote:
>>>> 
>>>>> I'm writing to gauge community interest and intent for parallel processing
with cTakes. 
>>>>> 
>>>>> Apache UIMA is planning "Async Scaleout" as a replacement for CPM. 
>>>>> http://uima.apache.org/doc-uimaas-what.html
>>>>> 
>>>>> Apache Mahout is likely to become the defacto apache package for machine
learning. 
>>>>> http://mahout.apache.org/
>>>>> 
>>>>> I believe cTakes will embrace both of these in due time.  
>>>>> Do you agree or do you have a different view?
>> 


Mime
View raw message