uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: Bewildered
Date Fri, 07 Mar 2008 16:30:44 GMT
Burn Lewis wrote:
> XML 1.0 does not accept all Unicode characters .... the legal ones are:
> 
>      #x9 | #xA | #xD | [#x20-#xD7FF] |   [#xE000-#xFFFD] |
> [#x10000-#x10FFFF]
> 
> So if you wish to serialize a CAS to a file or to a remote service you'll
> have to avoid the 29 legal (but useless?) low value ones.
> 
> UIMA could replace or escape them but both have possibly undesirable
> side-effects (lost information & non-standard XML.)  At the least this
> restriction should be documented.
> 

It is:

http://incubator.apache.org/uima/downloads/releaseDocs/2.2.1-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.xmi_emf.xml_character_issues

If your mail program doesn't like the URL, it's section 8.3.1 in the
UIMA Tutorial and Developers' Guides.

--Thilo


Mime
View raw message