uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Petr Baudis <pa...@ucw.cz>
Subject Re: UIMAj3 ideas
Date Thu, 16 Jul 2015 16:25:10 GMT

On Fri, Jul 10, 2015 at 10:28:08AM -0400, Eddie Epstein wrote:
> Good comments which will likely generate lots of responses.
> For now please see comments on scaleout below.
> On Thu, Jul 9, 2015 at 6:52 PM, Petr Baudis <pasky@ucw.cz> wrote:
> >   * UIMAfit is not part of core UIMA and UIMA-AS is not part of core
> >     UIMA.  It seems to me that UIMA-AS is doing things a bit differently
> >     than what the original UIMA idea of doing scaleout was.  The two
> >     things don't play well together.  I'd love a way to easily take
> >     my plain UIMA pipeline and scale it out, ideally without any code
> >     changes, *and* avoid the terrible XML config files.
> >
> >
> Not clear what you are referring to as the "original UIMA idea of doing
> scaleout",
> the CPE? Core UIMA is a single threaded, embeddable framework. UIMA-AS
> is also an embeddable framework that offers flexible vertical
> (multi-threading) and
> horizontal (multi-process) options for deploying an arbitrary pipeline.
> Admittedly
> scaleout with UIMA-AS is complicated and the minimal support for process
> management make it difficult to do scaleout simply. In what ways do you
> think
> UIMA-AS is inconsistent with UIMA or UIMA scaleout?

  Well, my impression after delving into some UIMA internals was that
the original idea was to use the Analysis Structure Broker to control
the pipeline flow and it would seem natural that when doing scale-out,
one would simply provide a different ASB.  Its javadoc even reads

> The Analysis Structure Broker (<code>ASB</code>) is the component
> responsible for the details of communicating with Analysis Engines
> that may potentially be distributed across different physical
> machines.

Of course, maybe I got it wrong.

> DUCC is full cluster management application that will scaleout a plain UIMA
> pipeline with no code changes, assuming that the application code is
> threadsafe.
> But a typical pipeline with a single collection reader creating input CASes
> and
> a single cas consumer will limit scaleout performance pretty quickly. DUCC
> makes it easyto eliminate the input data bottleneck. DUCC sample apps
> show one approach to eliminating the output bottleneck. Have you looked at

  I use UIMA pipeline for question answering, where each question
currently takes ~30s (single-threaded) to process (a lot of it spent
waiting on databases), so I don't think I'd hit such a bottleneck.
I did spend a few tens of minutes looking at DUCC, but I got the
impression that it's not really trivial to set up.

  One of my goals is to minimize setup hassles for anyone who wants to
run my software - ideally, they should be able to just compile and run.
If I started to use DUCC, I'm not sure to what degree I could preserve
this, but at least it's another element in the already steep learning
curve for anyone who wants to tinker with the system.

  (Then there's this whole issue of UIMA-AS vs. UIMAfit and in-memory
resource sharing - though from one of your previous emails, I got the
impression that I could run multiple AEs in threads of a single java
process; but I guess at that point I was already decided that I want
to try something less complex.)

				Petr Baudis
	If you have good ideas, good data and fast computers,
	you can do almost anything. -- Geoffrey Hinton

View raw message