cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sjur Nørstebø Moshagen <>
Subject Re: [i18n] Xalan replaces encoded characters with entities
Date Wed, 28 Apr 2004 05:46:28 GMT
På 28. apr. 2004 kl. 00.58 skrev Joerg Heinicke:

> On 20.04.2004 11:02, Sjur Nørstebø Moshagen wrote:
>> all my XML documents are completely in UTF-8, but Cocoon outputs 
>> entities for many non-ascii characters. Although this does not create 
>> any badly formatted pages, it does increase the size of the output 
>> html file (most such utf-8 characters will take 2 bytes, whereas the 
>> entities regularly take 7 or more bytes), and seems both unneccessary 
>> and some extra work in an all-utf-8 context, both for the server and 
>> the client. As my site contains a lot of these characters, I would 
>> like to turn it off. But it doesn't seem to be possible:
>> After some searching I hunted down the following paragraph in the 
>> description for XalanJ 2.6.0 
>> (
>>>     •     For HTML output, Xalan-Java 2 outputs character entity 
>>> references (&copy; etc.) for the special characters designated in  
>>> Appendix A. DTDs of the XHTML 1.0: The Extensible HyperText Markup  
>>> Language. Xalan-Java 1.x, on the other hand, outputs literal 
>>> characters for some of these special characters.
>> That is, it seems default behaviour, and I have found no Cocoon or 
>> other documentiation or tips to change it. Anyone can help me with 
>> this?
> I don't know any option to influence this behaviour.

Thanks for the answer. Due to the lack of responses (apart from yours), 
and the general lack of documentation on this feature, I have accepted 
the behaviour as intended and non-changeable. The "solution" would be 
to change from HTML to XML (e.g. XHTML). On the other hand, the 
behaviour has the nice (most likely intended) side effect that even 
browsers/OS-es as old as to not support Unicode/UTF-8 will be able to 
render all non-ASCII characters that are enccoded as entities. Not that 
that is very useful on my site, but it _does_ make it possible to read 
help/info pages that explain the character set issues involved for the 
site, and how possible browser problems can be resolved.

So for the time being I won't do anything to change the output, despite 
the increased size.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message