uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: the performance of UIMA AS
Date Tue, 18 May 2010 13:57:25 GMT

Here are just a few general observations.

A generally useful check: while the tests are running, examine the cpu %
busy on the various machines being used.  If it is not 100%, then look
for a bottleneck somewhere...

If you're running on one machine, then the speed ups you get will
probably only be seen if that machine is a multi-core machine, or
there's a lot of I/O that the annotators are doing.  In your case, the
annotators do no I/O - so you would need to be on a multi-core machine. 
Once you scale past the number of cores, there's no further speed up
possible, I think, for the main pipeline.

The timing measurements below I believe are wall-clock measures, not

If you do manage to get scaleout, the overall performance in this case
is probably going to be dictated by the rate at which your collection
reader can send CASes into the pipeline.  In many of our tests, where
we're deploying on a network of machines, we find that to load up the
pipeline, we have to pre-read all the test CASes into memory, ahead of
time, and then have the driver program send those as fast as it can, in
order to create a reasonable load.

HTH.   -Marshall 

On 5/18/2010 7:46 AM, LinTong wrote:
> Hallo everybody
> Now I am investigating UIMA AS. I'm very confused by the poor
> performance of UIMA-AS. I run the example AS descriptor
> MeetingDetectorTAE. No matter
> Deploy_MeetingDetectorTAE_3MeetingAnnotators.xml or
> Deploy_MeetingDetectorTAE_Sync_3Instances.xml, there is no speedup at
> all. Also I tried Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and
> deployed several instances of service RemoteRoomNumber. But still no
> speedup. My sample includes 450 documents. Actually MeetingDetectorTAE
> costs appx. 1000ms in CPE. Deploy_MeetingDetectorTAE.xml costs 5000ms
> in UIMA AS while all components are on the same machine. If I run
> Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and service
> RemoteRoomNumber on different computer, it takes almost 20000ms. I
> know these is overhead including de/serialisation, but there is no
> reason that the performance is so poor. Does anybody have idea about
> my problem? Did I make any stupid mistake?
> BTW, when I enable the flag named async, system gives the following
> debug information back. The analysis time and idle time seem quite
> strange. Does my AE only cost c.a. 280ms?(the collection reader I used
> costs c.a. 2000ms).
> INFO: Controller: [Meeting Detector TAE] Delegate <<Meeting Detector
> TAE>> Stats:
> 	 Total Number CASes Processed: 257
> 	 Total CAS Deserialization Time: 327,602 ms
> 	 Total CAS Serialization Time: 93,601 ms
> 	 Total Time Spent In Analysis: 280,802 ms
> 	 Max Serialization Time: 15,6 ms
> 	 Max Deserialization Time: 15,6 ms
> 	 Max Analysis Time: 202,801 ms
> 	 Total Idle Time: 1.625,275 ms
> Completed 451 documents; 593984 characters
> Time Elapsed : 4808 ms
> Thank you so much if somebody could help me !

View raw message