uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Sznajder <benj...@il.ibm.com>
Subject Re: Multi-threading with a CAS Multiplier.
Date Thu, 19 Jul 2007 04:25:16 GMT
Hi Adam,

Thanks for your answer. It was clear, do not worry ;-)

I repeat here my question and describe here the scenario we are interested
by :

Let's suppose we have a video document containing for example 10 minutes
long video.

We are interesting by a CPE that :
- separates the document in 10 CASes representing 5 CAS of 2 minutes long
part of video and 5 CAS of 2 minutes long part of speech.
- analyses via relevant pipeline the CAS representing Video and analyses
via relevant pipeline the CAS representing Speech.
- merges theses 10 CASes and sends to the CAS Consumer.
In addition, we would like to gain parallelism. An obvious way to gain it
would be that when a pipeline is analyzing the Video part, then the second
pipeline analyses in parallel the speech part...
That's our scenario...


Our first implementation was as I described:
A CASMultiplier is hidden in an Aggregated Engine  named AE1.
This AE1 contains
      - a CASMultiplier
      - a second AggregateEngine2 controlled by a FlowController composed
of different pipelines.
      - a CAS Merger
      - a CAS Consumer


The problem is exactly as you pointed: "set the processingUnitThreadcount
to 3 is that
you will get three instances of AE1 (and everything inside it).  Then
all three instances of AE1 will be run in parallel on different CASes."
And we have not really parallelism....


If I well understand, the problem comes from the fact that the CAS
Multipier must be hidden in an Aggregate...
I would like to get your opinion about the following workaround:
Why don't we hide the steps done by the CAS Multiplier in the Collection
Reader: the collection reader will read a document of 10 minutes long, and
will create 10 CASes corresponding to our 5 and 5 CASes of video and speech
of 2 minutes duration?
If we do the above, then setting the processingUnitThreadcount to 3 (or
more) will create three (or more) instances of our AggregateEngine2 and we
would get real parallelization between our 10 CASes. Do I miss something?

I hope my note was clear enough....

Best regards,
Benjamin






                                                                           
             "Adam Lally"                                                  
             <alally@alum.rpi.                                             
             edu>                                                       To 
             Sent by:                  uima-user@incubator.apache.org      
             lally.adam@gmail.                                          cc 
             com                                                           
                                                                   Subject 
                                       Re: Multi-threading with a CAS      
             18/07/2007 17:00          Multiplier.                         
                                                                           
                                                                           
             Please respond to                                             
             uima-user@incubat                                             
               or.apache.org                                               
                                                                           
                                                                           




On 7/18/07, Benjamin Sznajder <benjams@il.ibm.com> wrote:
> Hi Eddie,
>
> Thank you for your so rapid answer.
>
> Indeed, my CASMultiplier is hidden in an Aggregated Engine  named AE1.
> This AE1 contains
>       - a CASMultiplier
>       - a second AggregateEngine2 controlled by a FlowController composed
> of different pipelines.
>       - a CAS Merger
>       - a CAS Consumer
>
>
> The whole CPM is composed by a CollectionReader, and the AE1
>
> If I define the AggregateEngine2 (i.e the AE that does not contain the
> CASMultiplier) with multipleDeploymentAllowed=true, with each Analysis
> Engine in it, defined also with multipleDeploymentAllowed=true (including
> the FlowController)
> And if, in addition, I set <casProcessors casPoolSize="5"
> processingUnitThreadCount="3">
>
> In such configuration, do I run in parallel on the pipelines contained in
> AggregateEngine2?

What happens when you set the processingUnitThreadcount to 3 is that
you will get three instances of AE1 (and everything inside it).  Then
all three instances of AE1 will be run in parallel on different CASes.

So that means that the CASes coming out of a particular instance of
the CasMultiplier will not be processed in parallel with each other,
but there will be parallelization _across_ CAS Multipliers.

Hopefully that was clear... the key idea is that the CPM is unaware of
CAS Multipliers and will treat AE1 like a black-box 1-CAS-in,
1-CAS-out AE.  So it cannot parallelize _inside_ AE1, but it can
certainly replicate AE1 and run each instance on a separate thread.

-Adam



Mime
View raw message