uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Bearden <Charles.F.Bear...@uth.tmc.edu>
Subject UIMA-AS: non-XML char in text raises SAXParseException
Date Fri, 21 Oct 2011 17:58:16 GMT
I created a simple UIMA-AS pipeline comprising a collection reader and an 
aggregate AE, which I ran simply like so:

runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
   -d <deployment descriptor> \
   -c <collection reader descriptor> \

Evidently, the content I wish to process has some non-XML characters in it, 
because a certain bit of data raises an exception, the heart of which appears to be:

   Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 
character: , 0x19

The complete exception is here:
   <http://pastebin.com/rMPyAhqP>

The point in my code at which the exception enters the picture 
(NoteLinesFromDBReader.java:139) is the point in the .getNext() method where I 
get the next CAS:
   jcas = aCAS.getJCas();

I don't run into this problem when I use the old-fashioned CPE, so my thinking 
is that the CAS from the CR is being serialized before being put into the queue. 
Is the expectation in UIMA AS that I sanitize text artifacts of non-XML 
characters before the CR gets them? Or am I doing something else wrong perhaps?

Thanks for your help,
Chuck
-- 
Chuck Bearden
Programmer Analyst IV
The University of Texas Health Science Center at Houston
School of Biomedical Informatics
Email: Charles.F.Bearden@uth.tmc.edu
Phone: 713.500.9672


Mime
View raw message