Hi Nelson, Looking into this... Can you please confirm that the UTF-8 coding of the troublesome characters, in hexadecimal, is: F0 96 A6 80 F0 96 A6 90 EF BF BD EF BF BD If you have the string in Java, please try converting it to a UTF-8 string using something like: byte[] theBytes = myTestString.getBytes("UTF-8"); and then print out theBytes in hex; they should look like the above. If not, please let us know what the values is instead. Thanks. -Marshall On 12/9/2016 9:02 AM, nelson rivera wrote: > Hi i was read your explication and saw the link, but in my case, i > don't read any xml file. Just i copy the text, get a new input cas > from UimaAsynchronousEngine with getCAS(), set the text in the cas and > send the request whit sendCAS(). I use uima-as API 2.9.0 in the client > side. Apparently the characters are changed for its entities > corresponding when serialize the cas to send it, but i get the > mentioned exception "org.xml.sax.SAXParseException; lineNumber: 1; > columnNumber: 571; Character reference "&#" > in uima-as framework installed when trying to deserialize the cas > deserializeCasFromXmi(),to be processed for the service. > > 2016-12-08 16:48 GMT-05:00, Marshall Schor : >> Hi Nelson, >> >> I can't see the characters (sorry). >> >> This might be an issue caused by a discrepancy between the coding of the >> file >> being read, and the coding indicated on the xml header. Can you check that >> those two things are the same? >> >> See >> http://stackoverflow.com/questions/5165347/what-use-is-the-encoding-in-the-xml-header >> for example. >> >> -Marshall >> >> On 12/8/2016 4:20 PM, nelson rivera wrote: >>> i tried to proccess the following text in a service deploy in uima-as, >>> because is input of my application. This is the text : 𖦀 𖦐 � �. >>> These characters correspond to the bamun language, and apparently are >>> not invalid xml characters because tools such as browsers interpret >>> it and show it. After get a new input cas to proccesing, set the text >>> and send the request, i get the exception that i show below in >>> uima-as, the framework uima-as work and recovers correctly, just not >>> process this characters. >>> Could you tell me what happens with these characters, one of these is >>> invalid characters for framework uima-as? >>> >>> >>> >>> 04:00:31.606 - 14: >>> org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handleProcessRequestFromRemoteClient: >>> WARNING: >>> org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 571; >>> Character reference "&# >>> at >>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239) >>> at >>> org.apache.uima.aae.UimaSerializer.deserializeCasFromXmi(UimaSerializer.java:187) >>> at >>> org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.deserializeCASandRegisterWithCache(ProcessRequestHandler_impl.java:222) >>> at >>> org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handleProcessRequestFromRemoteClient(ProcessRequestHandler_impl.java:552) >>> at >>> org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handle(ProcessRequestHandler_impl.java:1090) >>> at >>> org.apache.uima.aae.handler.input.MetadataRequestHandler_impl.handle(MetadataRequestHandler_impl.java:78) >>> at >>> org.apache.uima.adapter.jms.activemq.JmsInputChannel.onMessage(JmsInputChannel.java:731) >>> >>