commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert burrell donkin <>
Subject Re: [betwixt] escaping attribute values question
Date Mon, 23 Jun 2003 22:37:08 GMT
hi peter

betwixt escapes characters as per the xml specification. i suspect that 
your problem is that the xml encoding specified is not the same as the 
encoding produced by your code.

i wouldn't support a default encoding of characters into entities since 
this is a controversial without any clear consensus about use case. i 
would prefer to see an optional strategy added to betwixt which would 
flexibly process text after it was generated but i don't have the time to 
code something like that myself at the moment.

if you feel like contributing a patch to do this, then it would be 
gratefully accepted. see 
and for some guidelines 
about how to go about this. please include test cases.

- robert

On Sunday, June 22, 2003, at 12:40 PM, Peter Nuernberg wrote:

> Hi-
>    I have the following problem.  I have some beans that have attributes 
> that may contain certain non-ascii characters (for example, the "o with a 
> slash" or "a-e ligature" one sees in Danish words).  When these are 
> written out to xml attributes, they are not escaped.  When I try to read 
> beans with attributes containing these characters, the parser complains 
> (org.xml.sax.SAXParseException: Character conversion error: 
> "Unconvertible UTF-8 character beginning with 0xf8" (line number may be 
> too low). at org.apache.crimson.parser.InputEntity.fatal(
> 1100))
>    I've looked at the way in which attribute values are escaped, but 
> haven't found much useful there.  It seems characters such as ampersamds 
> are escaped, but not non-ascii characters.  I'm not sure this is really a 
> betwixt bug or not - maybe the parser is incorrectly rejecting 
> well-formed xml...  I know that if I replace the character in the xml 
> file with an escape (e.g., o slash with &#248;), the file is read 
> correctly, but then it is consequently saved incorrectly (since the 
> ampersand will now be escaped, yielding &amp;#248; in the example above).
>    I've recently ported my system from my own home-brewed xml persistence 
> mechanism to betwixt.  In my system, I escaped all non-ascii characters 
> with their equivalent decimal codes.  I've looked through the faq and 
> user mailing lists, but didn't see a reference to this problem.  Can 
> someone provide a hint as to how to get around this problem? Would it be 
> preferrable to simply escape non-ascii characters?  (If so, I could 
> certainly provide a first-pass at the code for that.)
>    Thanks for any help.
> -Peter
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message