uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Petr Baudis <pa...@ucw.cz>
Subject [ANN] Multi-threaded UIMA ASB
Date Mon, 15 Jun 2015 00:20:57 GMT
  Hi!

  I have created an extension of UIMA that replaces its default ASB
with a multi-threaded one, so that if you have a CAS multiplier in your
pipeline, multiple generated CASes may be processed in parallel in
different threads.  It has a few warts, but should be generally much
simpler to use than UIMA-AS if you do not need fancy things like cluster
deployment.

  It even has some documentation now.  Find it at:

	https://github.com/brmson/yodaqa/tree/master/src/main/java/cz/brmlab/yodaqa/flow/asb

  (Right now, it just lives as part of my YodaQA software, simply copy
that directory to your project.  I can spin-off the package properly
if there'll be enough interest in it.  It shares the YodaQA licence
statement, i.e. ASL2.)


On Wed, May 20, 2015 at 03:27:20AM +0200, Petr Baudis wrote:
>   I'm looking into ways to run a part of my pipeline multi-threaded:
..snip..
>   (i) I'm using UIMAfit heavily, and multiple CAS multipliers and
> mergers (even within the parallel branches).  So I can't use CPE.
> 
>   (ii) I need multi-threading, not separate processes.  (I have just
> a meager 24G RAM (sigh) and one Java process with all the linguistic
> models and stuff loaded takes 3GB RAM.  So I really need to load these
> resources to memory only once.)
..snip..
>   However, (before actually trying) it still seems to me to be much
> easier to rewrite a piece of the stock ASB than use UIMA-AS with complex
> pipeline construed by UIMAfit...  So I think I will try that first (and
> report back).

  Whew, this was not so easy!  It took a good few days (and a few
start-overs) to do and debug, and I learnt more about UIMAj internals
than I ever cared to. ;-)  But I think I'm still happier with the result
than if I used UIMA-AS and it doesn't seem to deadlock or crash anymore
even on (IMHO) a fairly massive pipeline.

  (What I'm bothered by the most at this point is the fixed-size CAS
pool, though there are a few more issues; I tried to document them all
as well.)

  P.S.: Would there be any interest in merging this to UIMA proper,
or at least cleaning up some UIMA API bits to simplify and future-proof
the external package?  I admit up-front that I probably won't have time
to do all that work myself, but I'd be happy to cooperate with someone.

-- 
				Petr Baudis
	If you have good ideas, good data and fast computers,
	you can do almost anything. -- Geoffrey Hinton

Mime
View raw message