uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eddie Epstein <eaepst...@gmail.com>
Subject Re: Performance of CPE/CPM vs AS
Date Tue, 16 Aug 2011 22:06:55 GMT
> I get it now. When I put the CR into the aggregate AE and ran
> 'runRemoteAsyncAE.sh' without the '-c' flag, it was within a few seconds of
> being as fast as the CPE.
That's good :)
The few seconds are because the CPE/CPM runs the CR in a separate
thread than the AE, but the simple UIMA AS deployment is running then
in the same thread. Should be closer if the aggregate is deployed in
the "async" mode. Also, if AE is thread-safe, try increasing its
number of instances.

> Thanks for the pointers. One take-away for me seems to be that UIMA AS might
> not be a means to scale for performance if you have to run service instances
> remotely.
Running UIMA AS services remotely can provide almost linear scale out
performance, as long as data flow is designed carefully.

Eddie


On Tue, Aug 16, 2011 at 4:43 PM, Charles Bearden
<Charles.F.Bearden@uth.tmc.edu> wrote:
> On 08/16/2011 01:45 PM, Eddie Epstein wrote:
>>>
>>> Thanks again for your reply. I thought that I was deploying the pipeline
>>> in
>>> one AS process with the first option for running it:
>>>
>>> runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
>>>  -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml \
>>>  -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml
>>>
>>> It looks like one process in the output of ps. I'm just surprised that
>>> the
>>> performance is so much slower (16x slower).
>>
>> Right, all in one process, but the connection between client and
>> service is the same used between multiple processes. As a quick test,
>> create a new aggregate with these two delegates:
>> SentencesFromDBReader.xml and
>> SbmiUmlsSmallAggregatePlaintextProcessor.xml. Then create a deployment
>> descriptor for this aggregate, say
>> Deploy_OneProcessDictionaryTest.xml, and test it with:
>>
>> runRemoteAsyncAE.sh tcp://localhost:61616 OneProcessQueue \
>> -d
>> sbmi-ctsa/desc/asynchronous_scaleout/Deploy_OneProcessDictionaryTest.xml
>>
>> Without a collection reader runRemoteAsyncAE will send a single empty
>> CAS to the service. This will kick off the embedded collection reader
>> in the aggregate, and hopefully you'll see times similar to the CPE.
>
> I get it now. When I put the CR into the aggregate AE and ran
> 'runRemoteAsyncAE.sh' without the '-c' flag, it was within a few seconds of
> being as fast as the CPE.
>
> Thanks for the pointers. One take-away for me seems to be that UIMA AS might
> not be a means to scale for performance if you have to run service instances
> remotely. What I've been doing is to run a bunch of CPEs in parallel, using
> the modulo operator in the SQL of the CR to ensure that each CR is pulling
> data from its own partition of the collection, e.g.
>
>  SELECT TEXT
>  FROM DOCUMENTS
>  WHERE ID % 25 = x
>
> where each of the 25 instances will have a different number from [0, 1, 2, 3
> …] for its 'x'.
>
> Thanks again to all who responded. I've learned a lot.
>
> Chuck
>
>>> To create a pipeline with an architecture like Figure 5, I would use the
>>> example in "4.6. Asynchronous Client API Usage Scenarios" on p. 30 of the
>>> uima_async_scaleout.pdf for 2.3.1?
>>
>> That would be one way. The important points are 1) to send a CAS which
>> points at some subset of the collection, and 2) change the embedded
>> collection reader inside the service to a CasMultiplier which can
>> access that CAS and generate the sub-collection of CASes to the
>> pipeline. Given these 2, a static set of CASes to be sent to the
>> service could be created and runRemoteAsyncAE used to send them.
>>
>> Eddie
>
>

Mime
View raw message