Return-Path: Delivered-To: apmail-tomcat-users-archive@www.apache.org Received: (qmail 56446 invoked from network); 15 Aug 2008 22:15:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Aug 2008 22:15:14 -0000 Received: (qmail 69002 invoked by uid 500); 15 Aug 2008 22:15:01 -0000 Delivered-To: apmail-tomcat-users-archive@tomcat.apache.org Received: (qmail 68842 invoked by uid 500); 15 Aug 2008 22:15:01 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 68831 invoked by uid 99); 15 Aug 2008 22:15:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Aug 2008 15:15:01 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [212.85.38.174] (HELO popeye.combios.es) (212.85.38.174) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Aug 2008 22:14:04 +0000 Received: from [192.168.245.194] (montserrat.wissensbank.com [212.85.37.175]) (authenticated bits=0) by popeye.combios.es (8.13.8/8.13.8/Debian-3) with ESMTP id m7FMEI2M017000 for ; Sat, 16 Aug 2008 00:14:23 +0200 Message-ID: <48A5FF7D.6000801@ice-sa.com> Date: Sat, 16 Aug 2008 00:13:17 +0200 From: =?UTF-8?B?QW5kcsOpIFdhcm5pZXI=?= User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) MIME-Version: 1.0 To: Tomcat Users List Subject: Re: URIEncoding UTF-16 problem References: <30FB1556446A63478616992ECB0E151B01C20089@gms3i004-1.orbian.priv> <48A572B1.5080401@ice-sa.com> <48A59CF2.6060600@christopherschultz.net> In-Reply-To: <48A59CF2.6060600@christopherschultz.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on popeye.combios.es X-Virus-Scanned: ClamAV 0.92.1/8047/Fri Aug 15 11:55:57 2008 on popeye.combios.es X-Virus-Status: Clean X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-100.0 required=2.5 tests=USER_IN_WHITELIST autolearn=failed version=3.2.3 Christopher Schultz wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > André, > > André Warnier wrote: >> Could you tell us *why* exactly you [are trying to use UTF-16]? >> It is rather unusual, as it supposes that you expect all clients to >> encode their requested URI's in UTF-16 prior to sending the request to >> Tomcat on that connector. To my knowledge, no standard client (browser) >> will ever do so. > > ...at least not on the first request. > > The beauty of using an encoding like UTF-8 is that ASCII is a strict > subset: any plain-old ASCII request can be interpreted as a UTF-8 > request, which means that if you want to use UTF-8 on your site, but > your visitors come in using ASCII, there's no problem (unless they have > weird characters in their first request, which is rare). The OP is talking about UTF-16, not UTF-8. What you are saing above about ASCII/UTF-8 is true, if one restricts oneself to strictly the 7-bit US-ASCII. That'ok for English, but not OK for mostly any other language on this planet. The default charset on the Web is iso-8859-1 (latin-1), not US-ASCII. Any character of iso-8859-1 whose codepoint is above 128 decimal does not encode as a single byte in UTF-8. My own name, expressed in the Unicode alphabet and encoded in UTF-8, occupies 6 bytes, not 7. Encoded as UTF-16, it occupies 12 bytes, half of which have a hex value of 00. Now about the "first request" bit : not on the first request, nor on any subsequent request, unless the server finds a way to tell the application that it only accepts requests with URI's encoded as UTF-16, and the browser not only understands the instruction, but obeys it. If there is an accepted and supported way to do that, I'd be glad to hear it, as it would solve a lot of practical web internationali(z/s)ation problems. So, back to the original question : why set the connector to UTF-16 URI encoding ? That will almost guarantee that Tomcat will not properly understand any URL requested by a standard browser. André --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org For additional commands, e-mail: users-help@tomcat.apache.org