From uima-user-return-388-apmail-incubator-uima-user-archive=incubator.apache.org@incubator.apache.org Wed Jul 11 13:29:16 2007 Return-Path: Delivered-To: apmail-incubator-uima-user-archive@locus.apache.org Received: (qmail 3148 invoked from network); 11 Jul 2007 13:29:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Jul 2007 13:29:16 -0000 Received: (qmail 75788 invoked by uid 500); 11 Jul 2007 13:29:18 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 75764 invoked by uid 500); 11 Jul 2007 13:29:18 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 75751 invoked by uid 99); 11 Jul 2007 13:29:18 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jul 2007 06:29:18 -0700 X-ASF-Spam-Status: No, hits=0.6 required=10.0 tests=DNS_FROM_AHBL_RHSBL,MAILTO_TO_SPAM_ADDR,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of lally.adam@gmail.com designates 209.85.146.180 as permitted sender) Received: from [209.85.146.180] (HELO wa-out-1112.google.com) (209.85.146.180) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jul 2007 06:29:15 -0700 Received: by wa-out-1112.google.com with SMTP id n4so5083692wag for ; Wed, 11 Jul 2007 06:28:54 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=b18JMAmFO5169Ah2wlWUzxJKZi1Hp4AWUCyJdI1EiHUSq+b7aeApemRYeBmYegvhyWQFdKKAJNMjDWyss91HM4DjUCjhFjigYv6QHBrVanKY8Ni/VF95pl6YONIzOCsr7zOVok9nQiVwJz3USAIRt6lfhGVWV7efaPNVFRTni+8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=GFEBDD2r85+aWPtTMrD0hAMagy8sZ+npYQSPS6UQRxJVXJVnPRh5iepGlpehfZDU5QB7OKXw627AsQBl2tgEUH8F3zrxViIWy4ZLB7MVfdHX7oGcc/aF5OBx/WK79rQrZxaVCDTaeiKjn9JmaRMXQA/B19/p80pZmURbtc0keiM= Received: by 10.114.37.1 with SMTP id k1mr5155608wak.1184160534778; Wed, 11 Jul 2007 06:28:54 -0700 (PDT) Received: by 10.114.102.16 with HTTP; Wed, 11 Jul 2007 06:28:54 -0700 (PDT) Message-ID: <2787e08a0707110628j2b80c311lec06d154b54ed8fe@mail.gmail.com> Date: Wed, 11 Jul 2007 09:28:54 -0400 From: "Adam Lally" Sender: lally.adam@gmail.com To: uima-user@incubator.apache.org Subject: Re: Why XmiCasSerializer is slow In-Reply-To: <071120070144.7075.469435E5000AE0F300001BA32205886442C0C0CFCD099D0A0D03040108@comcast.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <071120070144.7075.469435E5000AE0F300001BA32205886442C0C0CFCD099D0A0D03040108@comcast.net> X-Google-Sender-Auth: 5fd14e16c26f8607 X-Virus-Checked: Checked by ClamAV on apache.org On 7/10/07, greg@holmberg.name wrote: > I had previously described that when I used XmiCasSerializer with many (10) concurrent AnalysisEngines, my throughput dropped to about half, and wasn't scaling up. > > I did some profiling of my code using JProbe, and I think I've found the problem. > > I discovered that my application spent 64% of its elapsed time in XmiCasSerializer and it's child methods. Within that, one method rose to the top: 72% of elapsed time was spent in TypeSystemImpl.ll_isValidTypeCode(). In fact, this exceeded the time spent in XmiCasSerializer (114%). > > This in turn was almost all in SymbolTable.getSymbol(). This was called over 17 million times in my application, which spent 72% of its elapsed time in this one method. 99.9% of its time was spent in itself, and not it's children (Vector.get(int) was the highest child, at 0.1%). > > I'm not exactly sure why this method takes so long. I suspect it's a concurrency issue. I see a synchronized block in the set() method, so that would be something to look into. Given that some of my AnalysisEngines may be inserting annotations while others are executing XmiCasSerializer, I can see potential for conflict. > > Hopefully, these clues will be enough for someone familiar with the code to figure it out. > Very Interesting... Vectors are internally synchronized. All the CASes in the CAS Pool share the same instance of the TypeSystemImpl, so they will all synchronize when calling ll_isValidTypeCode(). I wonder if switching the Vector to an ArrayList would help. (Thilo, would that be safe? If set() is itself synchronized, and if nothing else modifies the table, then it seems like it would be. Also perhaps we can drastically reduce the number of times the serializer calls this method. Among other places, it looks like it is called by Type.isArray, which is in turn called by TypeSystem.subsumes. I'd have to run a test with the XmiCasSerializer to see what the real call stack looks like. -Adam