Return-Path: Delivered-To: apmail-incubator-uima-user-archive@minotaur.apache.org Received: (qmail 85816 invoked from network); 10 Dec 2009 14:02:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Dec 2009 14:02:15 -0000 Received: (qmail 57019 invoked by uid 500); 10 Dec 2009 14:02:14 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 56969 invoked by uid 500); 10 Dec 2009 14:02:13 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 56959 invoked by uid 99); 10 Dec 2009 14:02:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Dec 2009 14:02:13 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of christoph@neofonie.de designates 91.213.91.143 as permitted sender) Received: from [91.213.91.143] (HELO ns.neofonie.de) (91.213.91.143) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Dec 2009 14:02:11 +0000 Received: from [172.27.27.76] (helo=perseus.exchange.neofonie.de) by ns.neofonie.de with esmtp (Exim 4.63) (envelope-from ) id 1NIjaX-00051G-Ee for uima-user@incubator.apache.org; Thu, 10 Dec 2009 15:01:49 +0100 Received: from [172.27.40.211] ([172.27.40.211]) by perseus.exchange.neofonie.de with Microsoft SMTPSVC(6.0.3790.3959); Thu, 10 Dec 2009 14:53:59 +0100 Message-ID: <4B20FD72.6020805@neofonie.de> Date: Thu, 10 Dec 2009 14:53:54 +0100 From: =?UTF-8?B?Q2hyaXN0b3BoIELDvHNjaGVy?= User-Agent: Thunderbird 2.0.0.14 (X11/20080421) MIME-Version: 1.0 To: uima-user@incubator.apache.org Subject: Problems with "deserializeCasFromXmi" after using C++ AS Annotator Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-OriginalArrivalTime: 10 Dec 2009 13:53:59.0879 (UTC) FILETIME=[396F1570:01CA79A0] Hi, I currently encountered a problem with the XMI deserialization of a feature structure after calling a remote C++ AS annotator from a CPE. The szenario is the following: 1. I add a custom feature structure "DocumentData" containing an String Feature (the document URL) to the CAS in my CPE. The exact URL causing the problem is: documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1" 2. The CAS get's serialized to XMI before sending it to a remote C++ TAE. I added a breakpoint to UimaSerializer.serializeCasToXmi() and get the following part in the XMI string: documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1" So here the "&" character seems to be excaped correctly. 3. When the document comes back, the same feature in the XMI string received in UimaSerializer.deserializeCasFromXmi() reads: documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1" an now the SAXParser throws the following exception: org.xml.sax.SAXParseException: The reference to entity "_psmand" must end with the ';' delimiter. at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.uima.aae.UimaSerializer.deserializeCasFromXmi(UimaSerializer.java:170) at ... because the "&" comes back unescaped. I'm sure the C++ annotator in question doesn't change the feature in question and it also correctly adds its own annotations. I suspect there's something wrong in deserializing/serializing the CAS from XMI and back on the C++ side of things. Do you have any idea what might cause this problem or any suggestion where I can start to further narrow down the problem? The remote C++ AE is running with version "uimacpp-2.2.2-incubating". -- -------------------------------- Christoph Büscher Softwareentwicklung neofonie Technologieentwicklung und Informationsmanagement GmbH Robert-Koch-Platz 4 10115 Berlin fon: +49.30 24627 522 fax: +49.30 24627 120 http://www.neofonie.de Handelsregister Berlin-Charlottenburg: HRB 67460 Geschäftsführung Helmut Hoffer von Ankershoffen (Sprecher der Geschaeftsfuehrung) Nurhan Yildirim