cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antonio Gallardo" <agalla...@agssa.net>
Subject Re: svn commit: r111262 - in cocoon/branches/BRANCH_2_1_X/src: java/org/apache/cocoon/components/flow webapp/WEB-INF
Date Thu, 09 Dec 2004 08:56:20 GMT
On Jue, 9 de Diciembre de 2004, 2:49, Leszek Gawron dijo:
> Bertrand Delacretaz wrote:
>> Le 9 déc. 04, à 09:21, Leszek Gawron a écrit :
>>
>>> ...By the way: it is a little bit different on win32. Some tools
>>> detect utf encoding by checking for BOM. If there is none - ANSI
>>> encoding is assumed...
>>
>>
>> AFAIU this is ok for 16-bit based encodings, not for UTF-8.
>>
>> -Bertrand
> http://www.xencraft.com/resources/unicodebom.html
> <quote>
> Even though UTF-8 does not need a BOM to indicate endianness, Microsoft
> Notepad began prepending a BOM to its UTF-8 text files. Actually, it is
> a conversion of U+FEFF to an encoding as UTF-8 serialized bytes: EF BB
> BF (or in 4GL: CHR(15711167)). There is some value in the BOM being used
> as a file signature, indicating the plain text file is encoded as
> Unicode UTF-8, as opposed to some other code page. That particular
> 3-byte sequence is unlikely to represent data in any other code page,
> given the text is supposed to be human readable in some language.
> However, there is some small possibility that it represents some string
> in some code page... Because Microsoft did it, and there is so much
> Notepad data out there, the UTF-8 BOM became a de facto standard and
> then a de jure standard. (Although the BOM is optional.)
> </quote>
>
> M$ again.

This is the standard:

http://www.zvon.org/tmRFC/RFC3023/Output/chapter8.html#sub1 :-D

Best Regards,

Antonio Gallardo.


Mime
View raw message