tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: UTF-8 handling differs between two servlets within the same application
Date Mon, 23 Jun 2008 17:28:13 GMT

Christopher Schultz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Youssef,
> 
> Youssef Mohammed wrote:

Guys,
I am sorry to butt in again, but are you *really* sure that the problem 
is not earlier in the chain than what you think ?
I have read the article at the link given earlier :
http://wiki.apache.org/tomcat/Tomcat/UTF-8
and I am quite sure that what is said in that article is wrong, or at 
least incomplete.  The article seems to assume that whatever the browser 
sends is always iso-8859-1, and that at the server level you can then 
just go and "decode" it into utf-8.  That is wrong, I can assure you. 
Browsers will send utf-8 if the right conditions are met, and you will 
corrupt that data if you force it through a second encoding/decoding.
Browsers will also sometimes send iso-8859-1, if you are not careful or 
if the browser is buggy. It happens.  (iso-8859-1 is the default in 
HTTP, so if you do not specify things diferent, that is what you'll get).

In an ideal world, when a browser sends a string parameter via a POST, 
each parameter value should be enclosed in a part with a header and a 
content. The header of the part should have a line
Content-type: text/plain; charset=xxxxx
and the content of that part should then be in that xxxx charset encoding.

The receiving server should decode each part of the POST, and if it does 
it's job right, should look at the Content-type header, and use it to 
decode the corresponding parameter into Unicode (if it isn't yet so), 
because that is what the request.getParameter() would expect to receive, 
since Java's internal charset is Unicode.

I know you may have examined the value sent, using some snooping 
software. But even if the value is the same in terms of bytes, but the 
Content-type header is different, the final result may not be what you 
expect.

It is quite possible that Tomcat's innards do not do things correctly 
when they decode a POST, and just deliver the raw parameter value as 
received.  But that would surprise me, and I would submit that it would 
then be a bug.

André


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message