tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: URIEncoding UTF-16 problem
Date Fri, 15 Aug 2008 22:13:17 GMT
Christopher Schultz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> André,
> 
> André Warnier wrote:
>> Could you tell us *why* exactly you [are trying to use UTF-16]?
>> It is rather unusual, as it supposes that you expect all clients to
>> encode their requested URI's in UTF-16 prior to sending the request to
>> Tomcat on that connector.  To my knowledge, no standard client (browser)
>> will ever do so.
> 
> ...at least not on the first request.
> 
> The beauty of using an encoding like UTF-8 is that ASCII is a strict
> subset: any plain-old ASCII request can be interpreted as a UTF-8
> request, which means that if you want to use UTF-8 on your site, but
> your visitors come in using ASCII, there's no problem (unless they have
> weird characters in their first request, which is rare).

The OP is talking about UTF-16, not UTF-8.

What you are saing above about ASCII/UTF-8 is true, if one restricts 
oneself to strictly the 7-bit US-ASCII.  That'ok for English, but not OK 
for mostly any other language on this planet.
The default charset on the Web is iso-8859-1 (latin-1), not US-ASCII. 
Any character of iso-8859-1 whose codepoint is above 128 decimal does 
not encode as a single byte in UTF-8. My own name, expressed in the 
Unicode alphabet and encoded in UTF-8, occupies 6 bytes, not 7.
Encoded as UTF-16, it occupies 12 bytes, half of which have a hex value 
of 00.

Now about the "first request" bit : not on the first request, nor on any 
subsequent request, unless the server finds a way to tell the 
application that it only accepts requests with URI's encoded as UTF-16, 
and the browser not only understands the instruction, but obeys it.
If there is an accepted and supported way to do that, I'd be glad to 
hear it, as it would solve a lot of practical web 
internationali(z/s)ation problems.

So, back to the original question : why set the connector to UTF-16 URI 
encoding ? That will almost guarantee that Tomcat will not properly 
understand any URL requested by a standard browser.

André



---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message