Return-Path: Delivered-To: apmail-incubator-uima-user-archive@locus.apache.org Received: (qmail 8360 invoked from network); 14 Aug 2008 14:15:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 Aug 2008 14:15:22 -0000 Received: (qmail 41314 invoked by uid 500); 14 Aug 2008 14:15:20 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 41213 invoked by uid 500); 14 Aug 2008 14:15:20 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 41202 invoked by uid 99); 14 Aug 2008 14:15:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Aug 2008 07:15:20 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists.digitalpebble@gmail.com designates 72.14.220.155 as permitted sender) Received: from [72.14.220.155] (HELO fg-out-1718.google.com) (72.14.220.155) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Aug 2008 14:14:22 +0000 Received: by fg-out-1718.google.com with SMTP id l26so369465fgb.26 for ; Thu, 14 Aug 2008 07:14:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=i5qONCZ3PbYwaSqUbbyB1WsSEqtJ96jSJC9jJuKzDcg=; b=GFARu9TdiPuAeynVb4+KLwBtspPaZ18DU0Cn6qwHhWKrmHt9DzEYVCWQ8JtL96coUL 4+OGJtKN8idWM/77YtZ5UFODnteZbRUYJG+m4FAEd5V5qkXyRmLOfOGPlWPC+maaWVIJ 6yG5KE2J4IOZi2CWwLHAG2MvrdrIEJ+E9Mv4s= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=XMdHW51Ibghw/bPLVW87q2FcQgX0EDLWyJaZpmJmcjKgXGXEl47rt747dsWWTkQk19 YlasCjCqZVs9jqZG52MsMpjGNQ5vHeEo0EicpElJR9xB2CuG54z1+nnu24YCoXZOh/Pj r43LH1u92uhCT9XH5IJLi72oIQWkFHjkAu+Pw= Received: by 10.86.98.10 with SMTP id v10mr433693fgb.46.1218723271512; Thu, 14 Aug 2008 07:14:31 -0700 (PDT) Received: by 10.86.61.6 with HTTP; Thu, 14 Aug 2008 07:14:31 -0700 (PDT) Message-ID: <16d405e0808140714l45413330s5438b484a5b2a48b@mail.gmail.com> Date: Thu, 14 Aug 2008 15:14:31 +0100 From: "Julien Nioche" To: uima-user@incubator.apache.org Subject: Re: running aggregate engine within CPE and client code In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_104638_24245892.1218723271486" References: <16d405e0808130703g2c22f31du3236de0718d2b008@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_104638_24245892.1218723271486 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi Eddie, Thank you for your message. Yes, the profiling includes everything in my client code, including the I/O. I checked that casPoolSize="1" in my CPM config file. Setting casPoolSize="3" in the config file makes virtually no difference, which means that (a) loading my 2000 documents in the same thread or in a separate one makes no difference or (b) this parameter is not taken into account at all. With an aggregate engine : is each primitive engine executed in a separate thread or is the whole aggregate done in the same thread? Thank you for you help Julien 2008/8/14 Eddie Epstein > Hi Julien, > > Using default settings, the CPM will run the collection reader in one > thread, each processing pipeline in another, and finally another > thread for the Cas consumers. These threads can only run concurrently > if there are enough CASes. A Cas pool size of 1 limits all work to one > thread at a time. > > Does your profile take into account the I/O time reading the documents? > > Eddie > > On Wed, Aug 13, 2008 at 10:03 AM, Julien Nioche > wrote: > > Hi, > > > > I am slightly puzzled by the following case. I have integrated an > aggregate > > engine into my code in a very straightforward way : > > > > * // reset the tcas for the next document > > tcas.reset(); > > > > InputStream fis = new BufferedInputStream(new FileInputStream(target)); > > byte[] contents = new byte[(int) target.length()]; > > fis.read(contents); > > fis.close(); > > > > String document = new String(contents); > > > > tcas.setDocumentText(document); > > tcas.setDocumentLanguage("en"); > > > > controller.process(tcas); > > > > *Using the aggregate engine from the CPM is more than 10x faster than my > > client code; both are running in a single thread. I profiled my > application > > and found that the slower part is > > > > *87.9% - 50,781 ms > > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process* > > * > > *i.e the time is not spent in other parts of my code but in the process() > > method.* > > > > *I get a similar difference even when setting *casPoolSize="1" *in my CPE > > descriptor.* *Needless to say that I'd like to get the same type of > > performance in both cases. Any idea of what might be the cause?* > > ** > > *Thanks > > > > Julien* > > > > -- > > *DigitalPebble Ltd > > http://www.digitalpebble.com > > > -- DigitalPebble Ltd http://www.digitalpebble.com ------=_Part_104638_24245892.1218723271486--