tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Barker" <wbar...@wilshire.com>
Subject Re: TC 5.0.14 Breaks UTF-8 Content via HTTP Header
Date Tue, 11 Nov 2003 07:06:28 GMT
See inline.

----- Original Message ----- 
From: "Tony LaPaso" <tlapaso@comcast.net>
To: <tomcat-user@jakarta.apache.org>; <tomcat-dev@jakarta.apache.org>
Sent: Monday, November 10, 2003 8:15 PM
Subject: TC 5.0.14 Breaks UTF-8 Content via HTTP Header


> Hi everyone,
>
> It seems a change to TC v5.0.14 may have broken the serving of UTF-8
> documents. Specifically, one of the HTTP headers seems wrong. I'd like to
> describe what I'm seeing TC v5.0.14 compared with v5.0.12.
>
> For both v5.0.12 and v5.0.14 I'm running TC as it comes "out of the box"
> i.e., with no changes to the default configurations.
>
> In both cases I tested with four browsers (IE 5, IE 6, Netscape 7.1 and
> Firebird 0.7), all on Win 2K.
>
>
> Here's What I Did
> -----------------
> In both versions of TC, I added an "em dash" character to the
> "/tomcat-docs/cgi-howto.html" documents that come with the TC
documentation.
> The UTF-8 representation for the "em dash" character is the three bytes
> 0xE28094. I also made sure both documents had the following META tag in
its
> <head>:
>
> <meta http-equiv='Content-Type' content='text/html; charset=utf-8'/>
>
> I then saved the documents as UTF-8 (without a BOM). Finally, I brought
the
> document into a hex editor to check that the em dash was properly encoded
as
> three bytes (which it was). This indicated to me that the document was
> indeed encoded as UTF-8.
>
>
> Here's What I Saw (TC v5.0.12)
> ------------------------------
> Under TC v5.0.12, everything looked great using all browsers -- the "em
> dash" was rendered correctly. I put a sniffer on the HTTP stream. The
> v5.0.12 Coyote Connector was sending this HTTP response header:
> Content-Type: text/html
>
>
> Here's What I Saw (TC v5.0.14)
> ------------------------------
> Under TC v5.0.14 the "em dash" character was rendered as *THREE SEPARATE
> CHARACTERs* (one for each byte). Moreover, putting a sniffer on the HTTP
> stream indicated the following response header was being sent by the
v5.0.14
> Coyote Connector:
> Content-Type: text/html;charset=ISO-8859-1
>
>
> Aside
> -----
> For the heck of it I re-saved the v5.0.14 UTF-8 document with a BOM
> (0xEFBBBF). Doing this made IE correctly render it but Netscape and
Firebird
> still had problems. I'm pretty sure that Unicode says the BOM is optional
> anyway.
>
>
> Conclusion (?)
> --------------
> It seems that v5.0.14 of the Coyote Connector is incorrectly sending the
> wrong response header. I'm not sure what the HTTP spec says *should* be
sent
> for the header if the document's <head> contains:

The spec says nothing about META tags.  Tomcat (correctly) treats then as
just so much output text.

>
> <meta http-equiv='Content-Type' content='text/html; charset=utf-8'/>
>
> My guess is that either the response header in v5.0.14 needs to be changed
> to:
> Content-Type: text/html;charset=UTF-8
>
> or possibly:
>
> Content-Type: text/html
>
> as it was with TC v5.0.12.
>
> Can anyone comment? Is this a TC v5.0.14 bug? It would seem to be.

It looks like a 5.0.12 bug, that was subsequently fixed :).  The 2.4
Servlet-spec clearly states:
<spec-quote version="Servlet-2.4-pfd3" section="14.2.22">
If no character encoding has been specified, ISO-8859-1
is returned.
</spec-quote>

>
> Thanks,
>
> Tony
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
>


Mime
View raw message