tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tony LaPaso" <tlap...@comcast.net>
Subject TC 5.0.14 Breaks UTF-8 Content via HTTP Header
Date Tue, 11 Nov 2003 04:15:56 GMT
Hi everyone,

It seems a change to TC v5.0.14 may have broken the serving of UTF-8
documents. Specifically, one of the HTTP headers seems wrong. I'd like to
describe what I'm seeing TC v5.0.14 compared with v5.0.12.

For both v5.0.12 and v5.0.14 I'm running TC as it comes "out of the box"
i.e., with no changes to the default configurations.

In both cases I tested with four browsers (IE 5, IE 6, Netscape 7.1 and
Firebird 0.7), all on Win 2K.


Here's What I Did
-----------------
In both versions of TC, I added an "em dash" character to the
"/tomcat-docs/cgi-howto.html" documents that come with the TC documentation.
The UTF-8 representation for the "em dash" character is the three bytes
0xE28094. I also made sure both documents had the following META tag in its
<head>:

<meta http-equiv='Content-Type' content='text/html; charset=utf-8'/>

I then saved the documents as UTF-8 (without a BOM). Finally, I brought the
document into a hex editor to check that the em dash was properly encoded as
three bytes (which it was). This indicated to me that the document was
indeed encoded as UTF-8.


Here's What I Saw (TC v5.0.12)
------------------------------
Under TC v5.0.12, everything looked great using all browsers -- the "em
dash" was rendered correctly. I put a sniffer on the HTTP stream. The
v5.0.12 Coyote Connector was sending this HTTP response header:
Content-Type: text/html


Here's What I Saw (TC v5.0.14)
------------------------------
Under TC v5.0.14 the "em dash" character was rendered as *THREE SEPARATE
CHARACTERs* (one for each byte). Moreover, putting a sniffer on the HTTP
stream indicated the following response header was being sent by the v5.0.14
Coyote Connector:
Content-Type: text/html;charset=ISO-8859-1


Aside
-----
For the heck of it I re-saved the v5.0.14 UTF-8 document with a BOM
(0xEFBBBF). Doing this made IE correctly render it but Netscape and Firebird
still had problems. I'm pretty sure that Unicode says the BOM is optional
anyway.


Conclusion (?)
--------------
It seems that v5.0.14 of the Coyote Connector is incorrectly sending the
wrong response header. I'm not sure what the HTTP spec says *should* be sent
for the header if the document's <head> contains:

<meta http-equiv='Content-Type' content='text/html; charset=utf-8'/>

My guess is that either the response header in v5.0.14 needs to be changed
to:
Content-Type: text/html;charset=UTF-8

or possibly:

Content-Type: text/html

as it was with TC v5.0.12.

Can anyone comment? Is this a TC v5.0.14 bug? It would seem to be.

Thanks,

Tony






---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Mime
View raw message