uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Paris <nipari...@gmail.com>
Subject Re: UIMA analysis from a database
Date Fri, 15 Sep 2017 22:28:48 GMT
John

I have been looking at this project before. The way  I use spark doesn't
need this complexity: just spread the text in a RDD, and pass them into
the pipeline; done. 

Then 10 rows of scala code is sufficient in my case, and can be adapted
depending on the source (database, csv, pdfs...). Moreover the github
project is dead for 4 years...

Le 15 sept. 2017 à 16:32, Osborne, John D écrivait :
> Thanks Richard and Nicholas,
> 
> Nicholas - have you looked at SUIM (https://github.com/oaqa/suim) ?
> 
> It's also doing UIMA on Spark - I'm wondering if you are aware of it and how it is different
from your own project?
> 
> Thanks for any info,
> 
>  -John
> 
> 
> ________________________________________
> From: Richard Eckart de Castilho [rec@apache.org]
> Sent: Friday, September 15, 2017 5:29 AM
> To: user@uima.apache.org
> Subject: Re: UIMA analysis from a database
> 
> On 15.09.2017, at 09:28, Nicolas Paris <niparisco@gmail.com> wrote:
> >
> > - UIMA-AS is another way to program UIMA
> 
> Here you probably meant uimaFIT.
> 
> > - UIMA-FIT is complicated
> > - UIMA-FIT only work with UIMA
> 
> ... and I suppose you mean UIMA-AS here.
> 
> > - UIMA only focuses on text Annotation
> 
> Yep. Although it has also been used for other media, e.g. video and audio.
> But the core UIMA framework doesn't specifically consider these media.
> People who apply it UIMA in the context of other media do so with custom
> type systems.
> 
> > - UIMA is not good at:
> >       - text transformation
> 
> It is not straight-forward but possible. E.g. the text normalizers in
> DKPro Core make use of either different views for different states of
> normalization or drop the original text and forward the normalized
> text within a pipeline by means of a CAS multiplier.
> 
> >       - read data from source in parallel
> >       - write data to folder in parallel
> 
> Not sure if these two are limitations of the framework
> rather than of the way that you use readers and writers
> in the particular scale-out mode you are working with.
> 
> >       - machine learning interface
> 
> UIMA doesn't offer ML as part of the core framework because
> that is simply not within the scope of what the UIMA framework
> aims to achieve.
> 
> There are various people who have built ML around UIMA, e.g.
> ClearTK (https://urldefense.proofpoint.com/v2/url?u=http-3A__cleartk.github.io_cleartk_&d=DwICAw&c=o3PTkfaYAd6-No7SurnLtwPssd47t-De9Do23lQNz7U&r=SEpLmXf_P21h_X0qEQSssKMDDEOsGxxYoSxofi_ZbFo&m=tAU9eh1Sq_D-L1P4GfuME4SQleRf9q_7Ll9siim5W0c&s=J1-BGfzlrX9t3-Vg5K7mAVBHQSb7M5PAbTYIJoh6sOM&e=
) or DKPro TC
> (https://urldefense.proofpoint.com/v2/url?u=https-3A__dkpro.github.io_dkpro-2Dtc_&d=DwICAw&c=o3PTkfaYAd6-No7SurnLtwPssd47t-De9Do23lQNz7U&r=SEpLmXf_P21h_X0qEQSssKMDDEOsGxxYoSxofi_ZbFo&m=tAU9eh1Sq_D-L1P4GfuME4SQleRf9q_7Ll9siim5W0c&s=kye5D2izwKE_9V2QQW8leiKp0p-91U-CFwXJMFmCd3w&e=
) - and as you did, it
> can be combined in various ways with ML frameworks that
> specialize specifically on ML.
> 
> 
> Cheers,
> 
> -- Richard
> 
> 

Mime
View raw message