uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eddie Epstein <eaepst...@gmail.com>
Subject Re: Multi-threaded UIMA ParallelStep
Date Wed, 20 May 2015 11:56:33 GMT
Parallel-step currently only works with remote delegates. The other
approach, using CasMultipliers, allows an arbitrarily amount of parallel
processing in-process. A CM would create a separate CAS for each delegate
intended to run in parallel, and use a feature structure to hold a unique
identifier in each child CAS which a custom flow controller would use to
direct these CASes to the desired delegates. Results for the parallel flows
could be merged in a CasConsumer back into the parent CAS or to some other

Some other key concepts here are the CasCopier, which can be used to
efficiently copy large amounts of CAS content from one CAS to another, and
"process-parent-last" which can be specified for a CasMultiplier so that
further processing of a parent CAS will not continue until all of its
children have completed processing.


On Tue, May 19, 2015 at 9:27 PM, Petr Baudis <pasky@ucw.cz> wrote:

>   Hi!
>   I'm looking into ways to run a part of my pipeline multi-threaded:
>                 .-> Multip0 -> A1 -> Multip1 -> A2 ->.
>   reader -> A0 <                                      > CASmerger
>                 `-> Multip2 -> A3 ------------> A2 ->'
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>                 ParallelStep is generated for each branch
>                 in a custom flow controller
> Basically, I need a way to tell UIMA to run each ParallelStep (which
> normally just denotes the CAS flow) truly in parallel.  I have two
> constraints:
>   (i) I'm using UIMAfit heavily, and multiple CAS multipliers and
> mergers (even within the parallel branches).  So I can't use CPE.
>   (ii) I need multi-threading, not separate processes.  (I have just
> a meager 24G RAM (sigh) and one Java process with all the linguistic
> models and stuff loaded takes 3GB RAM.  So I really need to load these
> resources to memory only once.)
>   I looked into UIMA-AS, including Richard's helpful DKpro-lab code
> sample, but I can't figure out how to make it reasonably work with
> a *complex* UIMAfit pipeline that spans many branches and many
> analysis engines - it seems to me that I would need some centralized
> places where to specify it, and basically completely rewrite my pipeline
> building code (to the worse, in my impression).
>   ...and I'm not even sure, from reading UIMA-AS code, if I could make
> it run in multiple threads within a single process!  From comments in
> org/apache/uima/aae/controller/AggregateAnalysisEngineController_impl.java:parallelStep()
> I'm getting an impression that non-remote AEs will be executed serially
> after all, not in parallel.  Is that correct?
>   So going back to the original UIMA code, it seems to me that the thing
> to do would be replacing ASB_impl with my own copy (inheritance would
> not cut it the way it's coded), AggregateAnalysisEngine_impl with my own
> specialization or copy (as ASB_impl usage is hardcoded there) and
> rewrite the while() loop in ParallelStep case of ASB's
> processUntilNextOutputCas() to run in parallel.  And hope I didn't miss
> any catch...
>   Is there an option I'm missing?  Any hints would be really
> appreciated!
>   Thanks,
>                                 Petr Baudis

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message