uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Burn Lewis" <burnle...@gmail.com>
Subject Re: Bewildered
Date Fri, 07 Mar 2008 15:56:48 GMT
XML 1.0 does not accept all Unicode characters .... the legal ones are:

     #x9 | #xA | #xD | [#x20-#xD7FF] |   [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

So if you wish to serialize a CAS to a file or to a remote service you'll
have to avoid the 29 legal (but useless?) low value ones.

UIMA could replace or escape them but both have possibly undesirable
side-effects (lost information & non-standard XML.)  At the least this
restriction should be documented.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message