uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Ginter <thomas.gin...@utah.edu>
Subject Re: UIMAj3 ideas
Date Thu, 16 Jul 2015 19:42:58 GMT
Hi Petr,

Have you looked into using Leo?  It allows you to programmatically create Analysis Engines,
Aggregates, the type system, and launch everything in UIMA-AS without having to manage any
XML descriptors at all.  Furthermore it is available via Maven so your code can compile an


The only catch to running UIMA-AS is making sure the broker is running.  A manual step that
we have not yet automated.  Other than that it can scale most pipelines with the notable exception
of pipelines that have really large resources.

As for ideas for UIMA 3 I would love to see a much simpler CAS system that didn’t require
a pre-definition of types before execution.  Such as a very simple abstract base class that
defines an “annotation” and is then extended in order to create/use a new type.  It seems
like the basic location based indexes could still be provided that way as well as the option
of extending to provide custom indexes.  If the CAS was implemented as a base set of very
simple Java objects we would also have more serialization options.  Possibly even making it
possible for the user to plug in a different serializer if required such as protobuff.  Just
a thought.


Thomas Ginter

> On Jul 16, 2015, at 10:25 AM, Petr Baudis <pasky@ucw.cz> wrote:
>  Hi!
> On Fri, Jul 10, 2015 at 10:28:08AM -0400, Eddie Epstein wrote:
>> Good comments which will likely generate lots of responses.
>> For now please see comments on scaleout below.
>> On Thu, Jul 9, 2015 at 6:52 PM, Petr Baudis <pasky@ucw.cz> wrote:
>>>  * UIMAfit is not part of core UIMA and UIMA-AS is not part of core
>>>    UIMA.  It seems to me that UIMA-AS is doing things a bit differently
>>>    than what the original UIMA idea of doing scaleout was.  The two
>>>    things don't play well together.  I'd love a way to easily take
>>>    my plain UIMA pipeline and scale it out, ideally without any code
>>>    changes, *and* avoid the terrible XML config files.
>> Not clear what you are referring to as the "original UIMA idea of doing
>> scaleout",
>> the CPE? Core UIMA is a single threaded, embeddable framework. UIMA-AS
>> is also an embeddable framework that offers flexible vertical
>> (multi-threading) and
>> horizontal (multi-process) options for deploying an arbitrary pipeline.
>> Admittedly
>> scaleout with UIMA-AS is complicated and the minimal support for process
>> management make it difficult to do scaleout simply. In what ways do you
>> think
>> UIMA-AS is inconsistent with UIMA or UIMA scaleout?
>  Well, my impression after delving into some UIMA internals was that
> the original idea was to use the Analysis Structure Broker to control
> the pipeline flow and it would seem natural that when doing scale-out,
> one would simply provide a different ASB.  Its javadoc even reads
>> The Analysis Structure Broker (<code>ASB</code>) is the component
>> responsible for the details of communicating with Analysis Engines
>> that may potentially be distributed across different physical
>> machines.
> Of course, maybe I got it wrong.
>> DUCC is full cluster management application that will scaleout a plain UIMA
>> pipeline with no code changes, assuming that the application code is
>> threadsafe.
>> But a typical pipeline with a single collection reader creating input CASes
>> and
>> a single cas consumer will limit scaleout performance pretty quickly. DUCC
>> makes it easyto eliminate the input data bottleneck. DUCC sample apps
>> show one approach to eliminating the output bottleneck. Have you looked at
>> DUCC?
>  I use UIMA pipeline for question answering, where each question
> currently takes ~30s (single-threaded) to process (a lot of it spent
> waiting on databases), so I don't think I'd hit such a bottleneck.
> I did spend a few tens of minutes looking at DUCC, but I got the
> impression that it's not really trivial to set up.
>  One of my goals is to minimize setup hassles for anyone who wants to
> run my software - ideally, they should be able to just compile and run.
> If I started to use DUCC, I'm not sure to what degree I could preserve
> this, but at least it's another element in the already steep learning
> curve for anyone who wants to tinker with the system.
>  (Then there's this whole issue of UIMA-AS vs. UIMAfit and in-memory
> resource sharing - though from one of your previous emails, I got the
> impression that I could run multiple AEs in threads of a single java
> process; but I guess at that point I was already decided that I want
> to try something less complex.)
> -- 
> 				Petr Baudis
> 	If you have good ideas, good data and fast computers,
> 	you can do almost anything. -- Geoffrey Hinton

View raw message