uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Lally" <ala...@alum.rpi.edu>
Subject Re: Another interesting potential speedup
Date Mon, 06 Oct 2008 15:56:04 GMT
On Wed, Oct 1, 2008 at 3:21 PM, Marshall Schor <msa@schor.com> wrote:
> Profiling certainly shows unusual places you'd never think to look :-)
>
> This may be a bit of an anomaly - but we have a scaleout test for
> uima-as, sending large numbers of CASes over the wire (but the test is
> running in multiple JVMs on one machine - so there's no network
> delays).  We're running this with essentially empty CASes - just to see
> where other overhead is.
>
> We expected that things like deserialization would not show up - because
> the CASes were empty.  However, deserialization was the biggest time
> consumer.  Looking into this, it turns out that (in our particular case)
> 90% of the time in deserialization was due to creating a new XML Reader
> (the call: XMLReaderFactory.createXMLReader.  A quick search on the
> internet turned up this link:
> http://www.ibm.com/developerworks/xml/library/x-perfap2.html which
> suggested this could indeed be a bottleneck, which could be avoided by
> reusing the same XMLReader object, instead of throwing it away and
> getting a new one on every call.
>
> This would take some work (pooling, etc.) to make things thread-safe,
> but might be a good thing to do -- unless small but non-empty CASes turn
> out to bottleneck in some other way that swamps this measurement.
>
> This only applies to transports that use XML-style of
> serialization/deserialization, of course.
>

That sounds like a good find!  I think pooling might not actually be
necessary to use this in UIMA-AS.  If there are a fixed number of
listener threads that do deserialization, each can just create its own
instance of the XMLReader object once during initialization and then
reuse that one object multiple times.

I don't think the uima core has to change at all.  Just don't use the
static XmiCasDeserializer.deserialize methods, which internally create
XMLReaders.  If you look inside XmiCasDeserialiszer.deserialize you'll
see:

    XMLReader xmlReader = XMLReaderFactory.createXMLReader();
    XmiCasDeserializer deser = new XmiCasDeserializer(aCAS.getTypeSystem());
    ContentHandler handler = deser.getXmiCasHandler(aCAS, aLenient,
aSharedData, aMergePoint);
    xmlReader.setContentHandler(handler);
    xmlReader.parse(new InputSource(aStream));

Which you can do yourself in UIMA-AS, just moving the call to
XMLReaderFactory to an initialization step.

  -Adam

Mime
View raw message