uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eddie Epstein <eaepst...@gmail.com>
Subject Re: UIMA analysis from a database
Date Fri, 15 Sep 2017 17:57:28 GMT
There are a few DUCC features that might be of particular interest for
scaling out UIMA analytics.

 - all user code for batch processing continues to use the existing UIMA
component model: collection readers, cas multiplers, analysis engines, and
cas consumers.**

 - DUCC supports assembling and debugging a single threaded process with
these components, and then with no code change launch a highly scaled out
deployment.

 - for applications that use too much RAM to be able to utilize all the
cores on worker machines, DUCC can do the vertical (thread) scaleout needed
to share memory.

 - DUCC automatically captures the performance breakdown of the UIMA-based
processes, as well as capturing process statistics including CPU, RAM,
swap, pagefaults and GC. Performance breakdown info for individual tasks
(DUCC work items) can optionally be captured.

 - DUCC has extensive error handling, automatically resubmitting work
associated with uncaught exceptions, process crashes, machine failures,
network failures, etc.

 - Exceptions are convenient to get to, and an attempt is made to make
obvious things that might be tricky to find, such all the reasons a process
might fail to start, without having to dig through DUCC framework logs.

** DUCC services introduce a new user programmable component, a service
pinger, that is responsible for validating that a service is operating
correctly. The service pinger can also dynamically change the number of
instances of a service, and it can restart individual instances that are
determined to be acting badly.

Eddie

On Fri, Sep 15, 2017 at 10:32 AM, Osborne, John D <josborne@uabmc.edu>
wrote:

> Thanks Richard and Nicholas,
>
> Nicholas - have you looked at SUIM (https://github.com/oaqa/suim) ?
>
> It's also doing UIMA on Spark - I'm wondering if you are aware of it and
> how it is different from your own project?
>
> Thanks for any info,
>
>  -John
>
>
> ________________________________________
> From: Richard Eckart de Castilho [rec@apache.org]
> Sent: Friday, September 15, 2017 5:29 AM
> To: user@uima.apache.org
> Subject: Re: UIMA analysis from a database
>
> On 15.09.2017, at 09:28, Nicolas Paris <niparisco@gmail.com> wrote:
> >
> > - UIMA-AS is another way to program UIMA
>
> Here you probably meant uimaFIT.
>
> > - UIMA-FIT is complicated
> > - UIMA-FIT only work with UIMA
>
> ... and I suppose you mean UIMA-AS here.
>
> > - UIMA only focuses on text Annotation
>
> Yep. Although it has also been used for other media, e.g. video and audio.
> But the core UIMA framework doesn't specifically consider these media.
> People who apply it UIMA in the context of other media do so with custom
> type systems.
>
> > - UIMA is not good at:
> >       - text transformation
>
> It is not straight-forward but possible. E.g. the text normalizers in
> DKPro Core make use of either different views for different states of
> normalization or drop the original text and forward the normalized
> text within a pipeline by means of a CAS multiplier.
>
> >       - read data from source in parallel
> >       - write data to folder in parallel
>
> Not sure if these two are limitations of the framework
> rather than of the way that you use readers and writers
> in the particular scale-out mode you are working with.
>
> >       - machine learning interface
>
> UIMA doesn't offer ML as part of the core framework because
> that is simply not within the scope of what the UIMA framework
> aims to achieve.
>
> There are various people who have built ML around UIMA, e.g.
> ClearTK (https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__cleartk.github.io_cleartk_&d=DwICAw&c=o3PTkfaYAd6-No7SurnLtwPssd47t-
> De9Do23lQNz7U&r=SEpLmXf_P21h_X0qEQSssKMDDEOsGxxYoSxofi_ZbFo&m=tAU9eh1Sq_D-
> L1P4GfuME4SQleRf9q_7Ll9siim5W0c&s=J1-BGfzlrX9t3-
> Vg5K7mAVBHQSb7M5PAbTYIJoh6sOM&e= ) or DKPro TC
> (https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__dkpro.github.io_dkpro-2Dtc_&d=DwICAw&c=o3PTkfaYAd6-No7SurnLtwPssd47t-
> De9Do23lQNz7U&r=SEpLmXf_P21h_X0qEQSssKMDDEOsGxxYoSxofi_ZbFo&m=tAU9eh1Sq_D-
> L1P4GfuME4SQleRf9q_7Ll9siim5W0c&s=kye5D2izwKE_9V2QQW8leiKp0p-91U-
> CFwXJMFmCd3w&e= ) - and as you did, it
> can be combined in various ways with ML frameworks that
> specialize specifically on ML.
>
>
> Cheers,
>
> -- Richard
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message