uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From holmberg2066@comcast.net (g...@holmberg.name)
Subject Re: Performance bug in XmiCasSerializer?
Date Mon, 02 Jul 2007 21:27:15 GMT

 -------------- Original message ----------------------
From: "Adam Lally" <alally@alum.rpi.edu>
> Greg,
> It doesn't look to me like you're doing anything wrong.
> I did a quick test to try to reproduce this but wasn't able to... I
> may need more information about your set up.
> I created a CPE with the FileSystemCollectionReader,
> PersonTitleAnnotator, and your XmiCasAnnotator.  (I filled in the part
> about generating an identifier with something that checks the
> SourceDocumentInformation annotations put there by the
> FileSystemCollectionReader.)
> On a particular set of documents, with the CPE desriptor's
> processingUnitThreadCount set to 1 I get a total elapsed time of 9.25
> seconds, whereas with the processingUnitThreadCount set to 10 I get a
> total elapsed time of 6.875 seconds.  (This is on a dual-core
> machine.)

With two cores with Hyperthreading on, shouldn't you get about 3X the performance?  (Or at
least 2.67X) In other words, shouldn't you get an elapsed time of about 3.5 seconds?  Doesn't
almost 7 seconds indicate a problem?

It seems like you have replicated the problem.  You CPE takes about twice as long as it should.

In my case, I used about 4X the CPU, but maybe if you increased the number of entities produced,
the CPU usage ratio between 1 thread and 10 threads would also increase to the 4X level.

> A few questions come to mind:  Are you using a CPE to do the
> multithreading or something else?

I'm not using CPE.  I create my own threads, each one running a different AnalysisEngine created
from the same AnalysisEngineDescription and same ResourceManager.  The CASs come from a CasPool.

> If something else, do you see the
> same behavior if you try using a CPE instead?  Does this only happen
> with large documents, and/or does it only happen when you have a lot
> of annotations in the CAS (I have very few in my test).

I'm using 17 HTML documents that average 85K (http://www.9-11commission.gov/report/index.htm).
 My annotators produce a total of about 34,000 annotations from these (so an average of 2008
annotations per document).

So with ten concurrent AnalysisEngines, I would have 10 CASs with a total of about 20,000
annotations in memory at once.

Is that a lot?  How many do you have?


View raw message