ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From giri vara prasad nambari <girinamb...@gmail.com>
Subject Re: roadmap for Apache cTakes "big data" processing
Date Mon, 29 Apr 2013 04:34:54 GMT
It seems still it is tightly tied with Hadoop

https://cwiki.apache.org/confluence/display/MAHOUT/Quickstart


On Sun, Apr 28, 2013 at 9:39 PM, Andy McMurry <mcmurry.andy@gmail.com>wrote:

> Good point Pei.
>
> We would need to do a spike (short sprint) in the future to see if Mahout
> would be a good fit.
> I'm just wondering because I'm planning out how I will be using cTakes,
> and was wondering how others are planning as well.
>
>
> Cheers,
> --ANdy
>
>
> On Apr 28, 2013, at 5:39 PM, "Chen, Pei" <Pei.Chen@childrens.harvard.edu>
> wrote:
>
> > Has anyone tried Mahout recently?
> > Last time I tried, it was still closely tied to the Hadoop file system.
> >
> > Sent from my iPhone
> >
> > On Apr 28, 2013, at 7:44 PM, "Andy McMurry" <mcmurry.andy@gmail.com>
> wrote:
> >
> >> I encourage committers to checkout Apache Mahout
> >> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
> >>
> >> Why Apache Mahout?
> >> 1. provides ML classifiers and functions not available through UIMA
> >> 2. parallel by design, transparently invokes Hadoop
> >> 3. Java and Apache license (every other known toolkit is GPL!)
> >> 4. likely to become standard ML package for Apache
> >>
> >> Why would we use mahout in cTakes?
> >> cTakes models are "provided", for example PoS tagging.
> >> Retraining these models on your own compute cluster would be difficult
>  (in my opinion).
> >> LibSVM is nice, but it is only one classification method.
> >>
> >> When ?
> >> No rush, however, I suggest we dont invest time in porting SINGLE-CPU
> classifier functions that we will have to parallelize, later.
> >>
> >> Summary:
> >> UIMA + mahout = pipelines + classification
> >>
> >>
> >>
> >>
> >> On Apr 28, 2013, at 4:26 PM, "Savova, Guergana" <
> Guergana.Savova@childrens.harvard.edu> wrote:
> >>
> >>> +1
> >>> --guergana
> >>>
> >>> -----Original Message-----
> >>> From: Kaggal, Vinod C. [mailto:Kaggal.Vinod@mayo.edu]
> >>> Sent: Saturday, April 27, 2013 11:21 PM
> >>> To: <dev@ctakes.apache.org>
> >>> Cc: <dev@ctakes.apache.org>
> >>> Subject: Re: roadmap for Apache cTakes "big data" processing
> >>>
> >>> +1
> >>>
> >>>
> >>> On Apr 27, 2013, at 9:05 PM, "Chen, Pei" <
> Pei.Chen@childrens.harvard.edu> wrote:
> >>>
> >>>> +1 for UIMA-AS
> >>>>
> >>>>
> >>>> On Apr 27, 2013, at 9:25 PM, "Andy McMurry" <mcmurry.andy@gmail.com>
> wrote:
> >>>>
> >>>>> I'm writing to gauge community interest and intent for parallel
> processing with cTakes.
> >>>>>
> >>>>> Apache UIMA is planning "Async Scaleout" as a replacement for CPM.
> >>>>> http://uima.apache.org/doc-uimaas-what.html
> >>>>>
> >>>>> Apache Mahout is likely to become the defacto apache package for
> machine learning.
> >>>>> http://mahout.apache.org/
> >>>>>
> >>>>> I believe cTakes will embrace both of these in due time.
> >>>>> Do you agree or do you have a different view?
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message