commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Nuernberg <>
Subject [betwixt] escaping attribute values question
Date Sun, 22 Jun 2003 11:40:26 GMT

    I have the following problem.  I have some beans that have 
attributes that may contain certain non-ascii characters (for example, 
the "o with a slash" or "a-e ligature" one sees in Danish words).  When 
these are written out to xml attributes, they are not escaped.  When I 
try to read beans with attributes containing these characters, the 
parser complains (org.xml.sax.SAXParseException: Character conversion 
error: "Unconvertible UTF-8 character beginning with 0xf8" (line number 
may be too low). at 

    I've looked at the way in which attribute values are escaped, but 
haven't found much useful there.  It seems characters such as ampersamds 
are escaped, but not non-ascii characters.  I'm not sure this is really 
a betwixt bug or not - maybe the parser is incorrectly rejecting 
well-formed xml...  I know that if I replace the character in the xml 
file with an escape (e.g., o slash with &#248;), the file is read 
correctly, but then it is consequently saved incorrectly (since the 
ampersand will now be escaped, yielding &amp;#248; in the example above).

    I've recently ported my system from my own home-brewed xml 
persistence mechanism to betwixt.  In my system, I escaped all non-ascii 
characters with their equivalent decimal codes.  I've looked through the 
faq and user mailing lists, but didn't see a reference to this problem. 
  Can someone provide a hint as to how to get around this problem? 
Would it be preferrable to simply escape non-ascii characters?  (If so, 
I could certainly provide a first-pass at the code for that.)

    Thanks for any help.


View raw message