Return-Path: Delivered-To: apmail-uima-user-archive@www.apache.org Received: (qmail 67121 invoked from network); 18 May 2010 13:57:56 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 May 2010 13:57:56 -0000 Received: (qmail 92060 invoked by uid 500); 18 May 2010 13:57:56 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 92031 invoked by uid 500); 18 May 2010 13:57:56 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 92023 invoked by uid 99); 18 May 2010 13:57:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 May 2010 13:57:56 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=10.0 tests=AWL,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of msa@schor.com designates 69.56.170.20 as permitted sender) Received: from [69.56.170.20] (HELO gateway02.websitewelcome.com) (69.56.170.20) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 18 May 2010 13:57:47 +0000 Received: (qmail 22403 invoked from network); 18 May 2010 14:00:45 -0000 Received: from gator74.hostgator.com (67.18.27.130) by gateway02.websitewelcome.com with SMTP; 18 May 2010 14:00:45 -0000 Received: from yktgi01e0-s5.watson.ibm.com ([129.34.20.19]:2900 helo=[9.2.35.72]) by gator74.hostgator.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1OENIT-0002Me-7R for user@uima.apache.org; Tue, 18 May 2010 08:57:25 -0500 Message-ID: <4BF29CC5.7040806@schor.com> Date: Tue, 18 May 2010 09:57:25 -0400 From: Marshall Schor User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: user@uima.apache.org Subject: Re: the performance of UIMA AS References: In-Reply-To: X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator74.hostgator.com X-AntiAbuse: Original Domain - uima.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - schor.com Hi, Here are just a few general observations. A generally useful check: while the tests are running, examine the cpu % busy on the various machines being used. If it is not 100%, then look for a bottleneck somewhere... If you're running on one machine, then the speed ups you get will probably only be seen if that machine is a multi-core machine, or there's a lot of I/O that the annotators are doing. In your case, the annotators do no I/O - so you would need to be on a multi-core machine. Once you scale past the number of cores, there's no further speed up possible, I think, for the main pipeline. The timing measurements below I believe are wall-clock measures, not cpu-time. If you do manage to get scaleout, the overall performance in this case is probably going to be dictated by the rate at which your collection reader can send CASes into the pipeline. In many of our tests, where we're deploying on a network of machines, we find that to load up the pipeline, we have to pre-read all the test CASes into memory, ahead of time, and then have the driver program send those as fast as it can, in order to create a reasonable load. HTH. -Marshall On 5/18/2010 7:46 AM, LinTong wrote: > Hallo everybody > > Now I am investigating UIMA AS. I'm very confused by the poor > performance of UIMA-AS. I run the example AS descriptor > MeetingDetectorTAE. No matter > Deploy_MeetingDetectorTAE_3MeetingAnnotators.xml or > Deploy_MeetingDetectorTAE_Sync_3Instances.xml, there is no speedup at > all. Also I tried Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and > deployed several instances of service RemoteRoomNumber. But still no > speedup. My sample includes 450 documents. Actually MeetingDetectorTAE > costs appx. 1000ms in CPE. Deploy_MeetingDetectorTAE.xml costs 5000ms > in UIMA AS while all components are on the same machine. If I run > Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and service > RemoteRoomNumber on different computer, it takes almost 20000ms. I > know these is overhead including de/serialisation, but there is no > reason that the performance is so poor. Does anybody have idea about > my problem? Did I make any stupid mistake? > > BTW, when I enable the flag named async, system gives the following > debug information back. The analysis time and idle time seem quite > strange. Does my AE only cost c.a. 280ms?(the collection reader I used > costs c.a. 2000ms). > > > INFO: Controller: [Meeting Detector TAE] Delegate < TAE>> Stats: > Total Number CASes Processed: 257 > Total CAS Deserialization Time: 327,602 ms > Total CAS Serialization Time: 93,601 ms > Total Time Spent In Analysis: 280,802 ms > Max Serialization Time: 15,6 ms > Max Deserialization Time: 15,6 ms > Max Analysis Time: 202,801 ms > Total Idle Time: 1.625,275 ms > Completed 451 documents; 593984 characters > Time Elapsed : 4808 ms > > > Thank you so much if somebody could help me ! > >