xml-xalan-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Volkov <o...@unicorn.kiev.ua>
Subject RE: Is this a bug? (numbered entity references in produced HTML)
Date Wed, 02 Feb 2005 14:14:17 GMT
Thank you for the reply. Indeed, it is not a very big problem,
because one can always choose UTF-8 for the encoding. But files
in UTF-8 that consist mostly from national characters require
twice more disk space than those in native encoding, and are
opened in web browser little slower. I use Xalan to generate
large reports, and these aspects are important to me. I
understood, that the problem is hard to be resolved in near
future, and I have no choice but to use UTF-8.

----- Original Message -----
From: david_n_bertoni@us.ibm.com
To: xalan-dev@xml.apache.org
Date: Tue, 1 Feb 2005 08:59:29 -0800
Subject: Is this a bug? (numbered entity references in produced HTML)

>> Problem: HTML formatter writes numbered entity references
>> instead of characters in output encoding (specified in
>> "xsl:output" tag), despite the fact that output encoding
>> supports these characters.

 > It shouldn't be a problem, because any web browser should render the
 > character correctly.

>> Formatter writes numbered entity reference for the
>> character if the character is greater than the maximum
>> character for the encoding (m_maxCharacter), which is
>> always 0x7F for non-standard encodings
>> (XalanTranscodingServices::getMaximumCharacterValue).
>> This makes HTML documents incredibly large when custom
>> encoding, added with XMLTransService::addEncoding() is
>> used. Produced documents contain numbered entity reference
>> for every locale-specific character, because they all have
>> codes >0x007Fu.

 > Yes, this is a known problem with the design of the serializers.  I
 > started working on this about a year ago, but it has not been a high
 > priority, because very few people have complained about it.  You might
 > want to choose UTF-8 as the output encoding, if the size of the generated
 > files is too big.  Technically, XSLT processors are only required to
 > support UTF-8 and UTF-16, and fixing this has a potentially significant
 > performance impact on serialization, because it requires we lookup every
 > character to determine if the target encoding can represent it.

>> If this is a bug, can somebody register it? I couldn't do
>> this through JIRA web interface.

 > As long as you're registered, you should be able to create a bug report.

 > Dave

To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org

View raw message