Return-Path: Delivered-To: apmail-incubator-uima-dev-archive@locus.apache.org Received: (qmail 89469 invoked from network); 1 Oct 2008 19:21:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Oct 2008 19:21:59 -0000 Received: (qmail 39390 invoked by uid 500); 1 Oct 2008 19:21:58 -0000 Delivered-To: apmail-incubator-uima-dev-archive@incubator.apache.org Received: (qmail 39282 invoked by uid 500); 1 Oct 2008 19:21:57 -0000 Mailing-List: contact uima-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-dev@incubator.apache.org Delivered-To: mailing list uima-dev@incubator.apache.org Received: (qmail 39271 invoked by uid 99); 1 Oct 2008 19:21:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2008 12:21:57 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of msa@schor.com designates 69.56.144.3 as permitted sender) Received: from [69.56.144.3] (HELO gateway14.websitewelcome.com) (69.56.144.3) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 01 Oct 2008 19:20:54 +0000 Received: (qmail 14329 invoked from network); 1 Oct 2008 19:36:15 -0000 Received: from gator74.hostgator.com (67.18.27.130) by gateway14.websitewelcome.com with SMTP; 1 Oct 2008 19:36:15 -0000 Received: from yktgi01e0-s5.watson.ibm.com ([129.34.20.19]:60847 helo=[9.2.34.80]) by gator74.hostgator.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.68) (envelope-from ) id 1Kl7GB-0007pU-9j for uima-dev@incubator.apache.org; Wed, 01 Oct 2008 14:21:19 -0500 Message-ID: <48E3CDA8.300@schor.com> Date: Wed, 01 Oct 2008 15:21:12 -0400 From: Marshall Schor User-Agent: Thunderbird 2.0.0.17 (Windows/20080914) MIME-Version: 1.0 To: uima-dev Subject: Another interesting potential speedup X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator74.hostgator.com X-AntiAbuse: Original Domain - incubator.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - schor.com X-Virus-Checked: Checked by ClamAV on apache.org Profiling certainly shows unusual places you'd never think to look :-) This may be a bit of an anomaly - but we have a scaleout test for uima-as, sending large numbers of CASes over the wire (but the test is running in multiple JVMs on one machine - so there's no network delays). We're running this with essentially empty CASes - just to see where other overhead is. We expected that things like deserialization would not show up - because the CASes were empty. However, deserialization was the biggest time consumer. Looking into this, it turns out that (in our particular case) 90% of the time in deserialization was due to creating a new XML Reader (the call: XMLReaderFactory.createXMLReader. A quick search on the internet turned up this link: http://www.ibm.com/developerworks/xml/library/x-perfap2.html which suggested this could indeed be a bottleneck, which could be avoided by reusing the same XMLReader object, instead of throwing it away and getting a new one on every call. This would take some work (pooling, etc.) to make things thread-safe, but might be a good thing to do -- unless small but non-empty CASes turn out to bottleneck in some other way that swamps this measurement. This only applies to transports that use XML-style of serialization/deserialization, of course. -Marshall