uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Büscher <christoph.buesc...@neofonie.de>
Subject Re: Problems with "deserializeCasFromXmi" after using C++ AS Annotator
Date Thu, 10 Dec 2009 15:43:26 GMT
Hi again,

I was able to reproduce the problem also with the DaveDetector now. I wrote
a short unit test that I can provide upon request to demonstrate the problem.

Christoph

Christoph Büscher schrieb:
> Hi,
> 
> I currently encountered a problem with the XMI deserialization of a 
> feature structure after calling a remote C++ AS annotator from a CPE. 
> The szenario is the following:
> 
> 1. I add a custom feature structure "DocumentData" containing an String 
> Feature (the document URL) to the CAS in my CPE. The exact URL causing 
> the problem is:
> 
> documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1"

> 
> 
> 2. The CAS get's serialized to XMI before sending it to a remote C++ 
> TAE. I added a breakpoint to UimaSerializer.serializeCasToXmi() and get 
> the following part in the XMI string:
> 
> documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&amp;_psmand=1"

> 
> 
> So here the "&" character seems to be excaped correctly.
> 
> 3. When the document comes back, the same feature in the XMI string 
> received in UimaSerializer.deserializeCasFromXmi() reads:
> 
> documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1"

> 
> 
> an now the SAXParser throws the following exception:
> 
> org.xml.sax.SAXParseException: The reference to entity "_psmand" must 
> end with the ';' delimiter.
>     at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>     at 
> org.apache.uima.aae.UimaSerializer.deserializeCasFromXmi(UimaSerializer.java:170) 
> 
>     at ...
> 
> because the "&" comes back unescaped. I'm sure the C++ annotator in 
> question doesn't change the feature in question and it also correctly 
> adds its own annotations. I suspect there's something wrong in 
> deserializing/serializing the CAS from XMI and back on the C++ side of 
> things.
> Do you have any idea what might cause this problem or any suggestion 
> where I can start to further narrow down the problem?
> 
> The remote C++ AE is running with version "uimacpp-2.2.2-incubating".
> 
> 
> 


-- 
--------------------------------
Christoph Büscher
Softwareentwicklung

neofonie
Technologieentwicklung und
Informationsmanagement GmbH
Robert-Koch-Platz 4
10115 Berlin
fon: +49.30 24627 522
fax: +49.30 24627 120
http://www.neofonie.de

Handelsregister
Berlin-Charlottenburg: HRB 67460

Geschäftsführung
Helmut Hoffer von Ankershoffen
(Sprecher der Geschaeftsfuehrung)
Nurhan Yildirim

Mime
View raw message