uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Lally" <ala...@alum.rpi.edu>
Subject Re: Why XmiCasSerializer is slow
Date Wed, 11 Jul 2007 13:28:54 GMT
On 7/10/07, greg@holmberg.name <holmberg2066@comcast.net> wrote:
> I had previously described that when I used XmiCasSerializer with many (10) concurrent
AnalysisEngines, my throughput dropped to about half, and wasn't scaling up.
> I did some profiling of my code using JProbe, and I think I've found the problem.
> I discovered that my application spent 64% of its elapsed time in XmiCasSerializer and
it's child methods.  Within that, one method rose to the top: 72% of elapsed time was spent
in TypeSystemImpl.ll_isValidTypeCode().  In fact, this exceeded the time spent in XmiCasSerializer
> This in turn was almost all in SymbolTable.getSymbol().  This was called over 17 million
times in my application, which spent 72% of its elapsed time in this one method.  99.9% of
its time was spent in itself, and not it's children (Vector.get(int) was the highest child,
at 0.1%).
> I'm not exactly sure why this method takes so long.  I suspect it's a concurrency issue.
 I see a synchronized block in the set() method, so that would be something to look into.
 Given that some of my AnalysisEngines may be inserting annotations while others are executing
XmiCasSerializer, I can see potential for conflict.
> Hopefully, these clues will be enough for someone familiar with the code to figure it

Very Interesting...

Vectors are internally synchronized.  All the CASes in the CAS Pool
share the same instance of the TypeSystemImpl, so they will all
synchronize when calling ll_isValidTypeCode().  I wonder if switching
the Vector to an ArrayList would help.  (Thilo, would that be safe?
If set() is itself synchronized, and if nothing else modifies the
table, then it seems like it would be.

Also perhaps we can drastically reduce the number of times the
serializer calls this method.  Among other places, it looks like it is
called by Type.isArray, which is in turn called by
TypeSystem.subsumes.  I'd have to run a test with the XmiCasSerializer
to see what the real call stack looks like.


View raw message