uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Bearden <Charles.F.Bear...@uth.tmc.edu>
Subject Re: UIMA-AS: non-XML char in text raises SAXParseException
Date Mon, 24 Oct 2011 17:21:10 GMT
On 10/21/2011 03:17 PM, Marshall Schor wrote:
> also, see the comments here:  https://issues.apache.org/jira/browse/UIMA-387

Thanks for your replies. And now that I actually look at the 
RunRemoteAsyncAE.java code, I see the command line arg ('-b') that I should be 
able to use with runRemoteAsync.sh to make it do binary serialization.

> On 10/21/2011 1:58 PM, Charles Bearden wrote:
>> I created a simple UIMA-AS pipeline comprising a collection reader and an
>> aggregate AE, which I ran simply like so:
>> runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
>>    -d<deployment descriptor>  \
>>    -c<collection reader descriptor>  \
>> Evidently, the content I wish to process has some non-XML characters in it,
>> because a certain bit of data raises an exception, the heart of which appears
>> to be:
>>    Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0
>> character: , 0x19
>> The complete exception is here:
>>    <http://pastebin.com/rMPyAhqP>
>> The point in my code at which the exception enters the picture
>> (NoteLinesFromDBReader.java:139) is the point in the .getNext() method where I
>> get the next CAS:
>>    jcas = aCAS.getJCas();
>> I don't run into this problem when I use the old-fashioned CPE, so my thinking
>> is that the CAS from the CR is being serialized before being put into the
>> queue. Is the expectation in UIMA AS that I sanitize text artifacts of non-XML
>> characters before the CR gets them? Or am I doing something else wrong perhaps?
>> Thanks for your help,
>> Chuck

View raw message