cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antonio Gallardo <agalla...@agssa.net>
Subject Re: Entity escaping in o.a.c.c.serializers.XHTMLSerializer
Date Thu, 22 Jan 2009 16:14:37 GMT
Hi Andreas,

We hit the same issue some years ago and we found a more pragmatic solution:

In org.apache.cocoon.components.serializers.encoding.XHTMLEncoder add
the line marked with a + sign:


    private static final char ENCODINGS[][][] = {
+    { { 39 } , "&#39;".toCharArray() },
       { { 160 } , "&nbsp;".toCharArray() },


See:
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Entities_representing_special_characters_in_XHTML

Please let me know if this fix the issue, I will gladly commit the fix.

Best Regards,

Antonio Gallardo.


Andreas Hartmann escribió:
> Hi Cocoon devs,
>
> this issue has already been discussed several times, e.g. [1], but
> AFAIK has not been resolved yet.
>
> The XHTMLSerializer, or, more specifically, the XHMLEncoder, from the
> serializers block in Cocoon 2.1.x escapes all characters with a
> corresponding HTML 4.0 character entity reference into this entity
> reference. This causes issues with inline JavaScript, since e.g. the
> double quotes are transformed to &quot; which causes a JavaScript
> parsing error. Another minor negative effect is the increased document
> size.
>
> If I understand the W3C correctly, see e.g. [2], the recommended
> approach is to use the character set of the encoding as far as possible,
> and use escapes only in exceptional circumstances. I didn't find a
> reason why the XHTMLSerializer uses escapes, but I suspect that it is
> related to browser compatibility issues.
>
> Do you think it would make sense to make this behaviour configurable,
> e.g.
>
>   <use-entities>true|false</use-entities>
>
> Does the XHTMLSerializer in Cocoon 2.2 show a different behaviour?
>
> TIA for any comments!
>
> -- Andreas
>
>
> [1]
> http://www.nabble.com/Problem-with-XHTMLSerializers-to1311360.html#a1311360
>
> [2] http://www.w3.org/International/tutorials/tutorial-char-enc/
>
>


Mime
View raw message