tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: Tomcat 5 and UTF-8
Date Fri, 03 Apr 2009 09:15:01 GMT
Gregor Schneider wrote:
> And it's getting really nuts, when it comes to UTF-8: Talking about
> UTF-8 with or without BOM? Even the specs are not clear about that.
Actually, a UTF-8 stream should /never/ need a BOM, because there is no 
byte-order, UTF-8 being by definition byte-oriented.
The only problem is that, for instance MS-Windows Notepad adds a BOM to 
any text file it saves as UTF-8.  Is anyone surprised ?

Another, linked issue is this :
If you edit and save as UTF-8 an html page using, for example, Notepad, 
it will always prefix the file with such a totally superfluous BOM.
If you later serve this page with Apache or Tomcat, to an Internet 
Explorer browser, using no matter which HTTP Content-Type + charset 
header, Internet Explorer will see the BOM and decide that this page is 
encoded in UTF-8, no matter what any meta tag in the page says.

> In my oppinion, the whole character-set is a pain in the ass:
I agree with that.

> I personally wish IETF came up with some specs saying something like
> "the first n bytes of any stream have to be encoded in ASCII containg
> length and encoding-type of the rest of the stream".
I agree with that too, in general terms.
I believe that any file, any stream, should start with such a prefix, 
indicating at least the file's MIME type, charset and encoding (size may 
be unknown at that point), with a default of "text/plain", Unicode and 
I also believe there should be a HTTP 2.0 specification, specifying in 
clear terms a default Unicode/UTF-8 encoding for URLs, html pages, form 
data submission and so on, and a non-ambiguous way of deviating from that.

The problem is in bringing this about.
> I put that on my whishlist for xmas.
That's nice, but you would have to start by convicing Santa Klaus.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message