Return-Path: Delivered-To: apmail-incubator-uima-user-archive@locus.apache.org Received: (qmail 8839 invoked from network); 11 Jul 2007 01:44:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Jul 2007 01:44:28 -0000 Received: (qmail 68278 invoked by uid 500); 11 Jul 2007 01:44:30 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 68260 invoked by uid 500); 11 Jul 2007 01:44:30 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 68251 invoked by uid 99); 11 Jul 2007 01:44:30 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jul 2007 18:44:30 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [216.148.227.153] (HELO rwcrmhc13.comcast.net) (216.148.227.153) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jul 2007 18:44:26 -0700 Received: from rmailcenter16.comcast.net ([204.127.197.126]) by comcast.net (rwcrmhc13) with SMTP id <20070711014405m13004vrj3e>; Wed, 11 Jul 2007 01:44:05 +0000 Received: from [67.180.48.10] by rmailcenter16.comcast.net; Wed, 11 Jul 2007 01:44:05 +0000 From: holmberg2066@comcast.net (greg@holmberg.name) To: uima-user@incubator.apache.org Subject: Why XmiCasSerializer is slow Date: Wed, 11 Jul 2007 01:44:05 +0000 Message-Id: <071120070144.7075.469435E5000AE0F300001BA32205886442C0C0CFCD099D0A0D03040108@comcast.net> X-Mailer: AT&T Message Center Version 1 (Oct 4 2006) X-Authenticated-Sender: aG9sbWJlcmcyMDY2QGNvbWNhc3QubmV0 X-Virus-Checked: Checked by ClamAV on apache.org I had previously described that when I used XmiCasSerializer with many (10) concurrent AnalysisEngines, my throughput dropped to about half, and wasn't scaling up. I did some profiling of my code using JProbe, and I think I've found the problem. I discovered that my application spent 64% of its elapsed time in XmiCasSerializer and it's child methods. Within that, one method rose to the top: 72% of elapsed time was spent in TypeSystemImpl.ll_isValidTypeCode(). In fact, this exceeded the time spent in XmiCasSerializer (114%). This in turn was almost all in SymbolTable.getSymbol(). This was called over 17 million times in my application, which spent 72% of its elapsed time in this one method. 99.9% of its time was spent in itself, and not it's children (Vector.get(int) was the highest child, at 0.1%). I'm not exactly sure why this method takes so long. I suspect it's a concurrency issue. I see a synchronized block in the set() method, so that would be something to look into. Given that some of my AnalysisEngines may be inserting annotations while others are executing XmiCasSerializer, I can see potential for conflict. Hopefully, these clues will be enough for someone familiar with the code to figure it out. Greg Holmberg