uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche" <lists.digitalpeb...@gmail.com>
Subject Re: running aggregate engine within CPE and client code
Date Thu, 14 Aug 2008 14:14:31 GMT
Hi Eddie,

Thank you for your message. Yes, the profiling includes everything in my
client code, including the I/O.

I checked that casPoolSize="1" in my CPM config file. Setting
casPoolSize="3" in the config file makes virtually no difference, which
means that (a) loading my 2000 documents in the same thread or in a separate
one makes no difference or (b) this parameter is not taken into account at
all.

With an aggregate engine : is each primitive engine executed in a separate
thread or is the whole aggregate done in the same thread?

Thank you for you help

Julien

2008/8/14 Eddie Epstein <eaepstein@gmail.com>

> Hi Julien,
>
> Using default settings, the CPM will run the collection reader in one
> thread, each processing pipeline in another, and finally another
> thread for the Cas consumers. These threads can only run concurrently
> if there are enough CASes. A Cas pool size of 1 limits all work to one
> thread at a time.
>
> Does your profile take into account the I/O time reading the documents?
>
> Eddie
>
> On Wed, Aug 13, 2008 at 10:03 AM, Julien Nioche
> <lists.digitalpebble@gmail.com> wrote:
> > Hi,
> >
> > I am slightly puzzled by the following case. I have integrated an
> aggregate
> > engine into my code in a very straightforward way :
> >
> > * // reset the tcas for the next document
> >  tcas.reset();
> >
> >  InputStream fis = new BufferedInputStream(new FileInputStream(target));
> >  byte[] contents = new byte[(int) target.length()];
> >  fis.read(contents);
> >  fis.close();
> >
> >  String document = new String(contents);
> >
> >  tcas.setDocumentText(document);
> >  tcas.setDocumentLanguage("en");
> >
> >  controller.process(tcas);
> >
> > *Using the aggregate engine from the CPM is more than 10x faster than my
> > client code; both are running in a single thread. I profiled my
> application
> > and found that the slower part is
> >
> > *87.9% - 50,781 ms
> > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process*
> > *
> > *i.e the time is not spent in other parts of my code but in the process()
> > method.*
> >
> > *I get a similar difference even when setting *casPoolSize="1" *in my CPE
> > descriptor.* *Needless to say that I'd like to get the same type of
> > performance in both cases. Any idea of what might be the cause?*
> > **
> > *Thanks
> >
> > Julien*
> >
> > --
> > *DigitalPebble Ltd
> > http://www.digitalpebble.com
> >
>



-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message