forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sjur Nørstebø Moshagen <>
Subject Re: Forrest and UTF-8
Date Tue, 11 May 2004 10:42:17 GMT
På 11. mai. 2004 kl. 13.00 skrev Fabrice Bacchella:

> Le 10 mai 04, à 19:04, Juan Jose Pablos a écrit :
>> +1. UTF-8 should be the default encoding.
> Mmm. Does somebody knows the minimum version of Internet Explorer or 
> Netscape that support UTF-8 ? Using UTF-8 might break some not so old 
> navigators.

The alternative is to support a (potentially large) set of other 
encodings (which we probably should do anyway). But considering that:
- ASCII is a true subset of UTF-8, and
- Xalan, when serializing to HTML, will render all characters defined 
as entities in the HTML spec as entities (= ASCII) (the defined 
entities cover most of the non-ASCII part of the 8859-series, as well 
as other characters),

UTF-8 should be no problem for most of the browsers, even old ones. AND 
UTF-8 solves a lot of _other_ encoding problems in a multilingual 
world, of which many are just as problematic for old browsers as UTF-8.

I first perceived the Xalan behaviour as buggy, generating unnecessary 
large files in a UTF-8 setting (entitites use more space than a 
multibyte UTF-8 character), but considering backwards compatibility the 
behaviour is actually not so bad.

To sum up:
+1 - UTF-8 should be default, with alternative encodings available as 
an option.


View raw message