xml-xalan-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From david_n_bert...@us.ibm.com
Subject Re: Is this a bug? (numbered entity references in produced HTML)
Date Tue, 01 Feb 2005 16:59:29 GMT
> Problem: HTML formatter writes numbered entity references
> instead of characters in output encoding (specified in
> "xsl:output" tag), despite the fact that output encoding
> supports these characters.

It shouldn't be a problem, because any web browser should render the 
character correctly.

> Formatter writes numbered entity reference for the
> character if the character is greater than the maximum
> character for the encoding (m_maxCharacter), which is
> always 0x7F for non-standard encodings
> (XalanTranscodingServices::getMaximumCharacterValue).
> This makes HTML documents incredibly large when custom
> encoding, added with XMLTransService::addEncoding() is
> used. Produced documents contain numbered entity reference
> for every locale-specific character, because they all have
> codes >0x007Fu.

Yes, this is a known problem with the design of the serializers.  I 
started working on this about a year ago, but it has not been a high 
priority, because very few people have complained about it.  You might 
want to choose UTF-8 as the output encoding, if the size of the generated 
files is too big.  Technically, XSLT processors are only required to 
support UTF-8 and UTF-16, and fixing this has a potentially significant 
performance impact on serialization, because it requires we lookup every 
character to determine if the target encoding can represent it.

> If this is a bug, can somebody register it? I couldn't do
> this through JIRA web interface.

As long as you're registered, you should be able to create a bug report.


To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org

View raw message