commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Nuernberg <pnu...@cs.aue.auc.dk>
Subject [betwixt] escaping attribute values question
Date Sun, 22 Jun 2003 11:40:26 GMT
Hi-

    I have the following problem.  I have some beans that have 
attributes that may contain certain non-ascii characters (for example, 
the "o with a slash" or "a-e ligature" one sees in Danish words).  When 
these are written out to xml attributes, they are not escaped.  When I 
try to read beans with attributes containing these characters, the 
parser complains (org.xml.sax.SAXParseException: Character conversion 
error: "Unconvertible UTF-8 character beginning with 0xf8" (line number 
may be too low). at 
org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:1100))

    I've looked at the way in which attribute values are escaped, but 
haven't found much useful there.  It seems characters such as ampersamds 
are escaped, but not non-ascii characters.  I'm not sure this is really 
a betwixt bug or not - maybe the parser is incorrectly rejecting 
well-formed xml...  I know that if I replace the character in the xml 
file with an escape (e.g., o slash with &#248;), the file is read 
correctly, but then it is consequently saved incorrectly (since the 
ampersand will now be escaped, yielding &amp;#248; in the example above).

    I've recently ported my system from my own home-brewed xml 
persistence mechanism to betwixt.  In my system, I escaped all non-ascii 
characters with their equivalent decimal codes.  I've looked through the 
faq and user mailing lists, but didn't see a reference to this problem. 
  Can someone provide a hint as to how to get around this problem? 
Would it be preferrable to simply escape non-ascii characters?  (If so, 
I could certainly provide a first-pass at the code for that.)

    Thanks for any help.

-Peter



Mime
View raw message