Return-Path: Delivered-To: apmail-jakarta-tomcat-dev-archive@www.apache.org Received: (qmail 40191 invoked from network); 11 Nov 2003 04:16:57 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 11 Nov 2003 04:16:57 -0000 Received: (qmail 72662 invoked by uid 500); 11 Nov 2003 04:16:06 -0000 Delivered-To: apmail-jakarta-tomcat-dev-archive@jakarta.apache.org Received: (qmail 72606 invoked by uid 500); 11 Nov 2003 04:16:06 -0000 Mailing-List: contact tomcat-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Tomcat Developers List" Reply-To: "Tomcat Developers List" Delivered-To: mailing list tomcat-dev@jakarta.apache.org Received: (qmail 72580 invoked from network); 11 Nov 2003 04:16:06 -0000 Received: from unknown (HELO sccrmhc11.comcast.net) (204.127.202.55) by daedalus.apache.org with SMTP; 11 Nov 2003 04:16:06 -0000 Received: from abner (12-248-221-217.client.attbi.com[12.248.221.217]) by comcast.net (sccrmhc11) with SMTP id <2003111104161401100j5rbce>; Tue, 11 Nov 2003 04:16:15 +0000 Message-ID: <015d01c3a80a$817d55c0$d9ddf80c@abner> From: "Tony LaPaso" To: , Subject: TC 5.0.14 Breaks UTF-8 Content via HTTP Header Date: Mon, 10 Nov 2003 22:15:56 -0600 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4927.1200 X-MIMEOLE: Produced By Microsoft MimeOLE V5.50.4927.1200 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi everyone, It seems a change to TC v5.0.14 may have broken the serving of UTF-8 documents. Specifically, one of the HTTP headers seems wrong. I'd like to describe what I'm seeing TC v5.0.14 compared with v5.0.12. For both v5.0.12 and v5.0.14 I'm running TC as it comes "out of the box" i.e., with no changes to the default configurations. In both cases I tested with four browsers (IE 5, IE 6, Netscape 7.1 and Firebird 0.7), all on Win 2K. Here's What I Did ----------------- In both versions of TC, I added an "em dash" character to the "/tomcat-docs/cgi-howto.html" documents that come with the TC documentation. The UTF-8 representation for the "em dash" character is the three bytes 0xE28094. I also made sure both documents had the following META tag in its : I then saved the documents as UTF-8 (without a BOM). Finally, I brought the document into a hex editor to check that the em dash was properly encoded as three bytes (which it was). This indicated to me that the document was indeed encoded as UTF-8. Here's What I Saw (TC v5.0.12) ------------------------------ Under TC v5.0.12, everything looked great using all browsers -- the "em dash" was rendered correctly. I put a sniffer on the HTTP stream. The v5.0.12 Coyote Connector was sending this HTTP response header: Content-Type: text/html Here's What I Saw (TC v5.0.14) ------------------------------ Under TC v5.0.14 the "em dash" character was rendered as *THREE SEPARATE CHARACTERs* (one for each byte). Moreover, putting a sniffer on the HTTP stream indicated the following response header was being sent by the v5.0.14 Coyote Connector: Content-Type: text/html;charset=ISO-8859-1 Aside ----- For the heck of it I re-saved the v5.0.14 UTF-8 document with a BOM (0xEFBBBF). Doing this made IE correctly render it but Netscape and Firebird still had problems. I'm pretty sure that Unicode says the BOM is optional anyway. Conclusion (?) -------------- It seems that v5.0.14 of the Coyote Connector is incorrectly sending the wrong response header. I'm not sure what the HTTP spec says *should* be sent for the header if the document's contains: My guess is that either the response header in v5.0.14 needs to be changed to: Content-Type: text/html;charset=UTF-8 or possibly: Content-Type: text/html as it was with TC v5.0.12. Can anyone comment? Is this a TC v5.0.14 bug? It would seem to be. Thanks, Tony --------------------------------------------------------------------- To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org