uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Ginter <thomas.gin...@utah.edu>
Subject Re: Exception thrown during CAS serialization for Remote UIMA-AS Service
Date Fri, 15 Jun 2012 01:13:32 GMT
Jorn,

Thanks for the link to that section of documentation.  The mention of the XMLUtils class was
just what I needed.  I wrote an XmlFilter class that uses XMLUtils to detect invalid XML characters
and replace them with spaces so that our annotation offsets will still match the original
text.  I was thinking about the issue all wrong.  I was assuming that all ASCII-8 characters
are also valid XML-1.0 characters.

Thanks,

Thomas Ginter
801-448-7676
thomas.ginter@utah.edu




On Jun 14, 2012, at 3:52 PM, Jörn Kottmann wrote:

> You write a string to the CAS which contains a non-xml character.
> This character cannot be serialized into XMI, and thats what this exception is about.
> 
> Have a look at our documentation explaining the issue:
> http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.xmi_emf.xml_character_issues
> 
> Hope that helps,
> Jörn
> 
> On 06/14/2012 11:39 PM, Thomas Ginter wrote:
>> We are getting an odd error while trying to process large datasets using UIMA-AS
2.3.1.  There is an exception thrown by the XmiCasSerializer in the Client when it is in the
process of serializing a CAS to be sent to a remote service.  The exception is as follows:
>> 
>> org.apache.uima.resource.ResourceProcessException
>>       at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:854)
>>       at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:885)
>>       at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.process(BaseUIMAAsynchronousEngineCommon_impl.java:734)
>>       at gov.va.vinci.flap.Client.run(Client.java:181)
>>       at gov.va.vinci.density.DensityClient.main(DensityClient.java:137)
>> Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 character:
_, 0x1a
>>       at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254)
>>       at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:174)
>>       at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:1003)
>>       at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:755)
>>       at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:700)
>>       at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:268)
>>       at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$700(XmiCasSerializer.java:108)
>>       at org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1539)
>>       at org.apache.uima.aae.UimaSerializer.serializeCasToXmi(UimaSerializer.java:136)
>>       at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.serializeCAS(BaseUIMAAsynchronousEngineCommon_impl.java:260)
>>       at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:779)
>>       ... 4 more
>> 
>> It happens at apparently random points when processing the corpus and is never actually
"thrown" but is simply written to StdErr.  Also the serializer never seems to return which
means the UimaAsynchronoousEngine.process() method never returns and the client simply "hangs"
until it is manually terminated.  To resolve this issue I have implemented text filters for
the incoming CAS data to prevent anything out of the ASCII-8 range.  I have also tried switching
the server and client to binary serialization strategies but that causes the XmiCasSerializer
in my UimaAsBaseListener object to return errors attempting to serialize CAS objects revieved
in the entityProcessingComplete event.
>> 
>> Any suggestions from the UIMA masters?  How can I debug further so that I can find
out A: Where is this illegal character coming from and B: How can I prevent it from happening?
>> 
>> Thanks,
>> 
>> Thomas Ginter
>> 801-448-7676
>> thomas.ginter@utah.edu<mailto:thomas.ginter@utah.edu>
>> 
>> 
>> 
>> 
>> 
> 


Mime
View raw message