tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Funk <funk...@joedog.org>
Subject Re: UTF-8 POST request results in corrupted data
Date Tue, 07 Oct 2008 10:37:25 GMT
If you take that form and post it - how does the server know that the 
content is UTF-8? (Answer: it doesn't)

The HTML directives tell the browser to encode everything into UTF-8 on 
the way to the web server. But there is nothing that tells the webserver 
explicitly what the charset is of the incoming request.

See the server spec fo more details, in particular
4.9 Request data encoding
Currently, many browsers do not send a char encoding qualifier with the 
Content-Type header, leaving open the determination of the character 
encoding for reading HTTP requests. The default encoding of a request 
the container uses to create the request reader and parse POST data must 
be “ISO-8859-1” if none has been specified by the client request. 
However, in order to indicate to the developer in this case the failure 
of the client to send a character encoding, the container returns null
from the getCharacterEncoding method.


-Tim

Andre-John Mas wrote:
> Thanks for the answer on this point. Reading section 3.7.1 of RFC 2616 
> indicates that request can specify a character other than the default. 
> For this reason the following should technically be legal:
> 
> <form action="" method="post" 
> enctype="application/x-www-form-urlencoded; charset=utf-8" 
> accept-charset="utf-8">
> 
> What I see, from testing on my Mac, is that Firefox and Safari fail to 
> pass the charset attribute, but Opera does. What I do notice here is 
> that even though Opera does specify the character set, Tomcat ignores it 
> replacing the submitted Japanese characters by question
> marks. This is an indication that UTF-8 was accepted but it was 
> converted to ISO-8859-1 and no equivalent mapping was available. With 
> Firefox and Safari I get the same behaviour when I specify:
> 
>    request.setCharacterEncoding("UTF-8");
> 
> Basically I am not getting the Japanese characters as typed in the form. 
> There is a problem here.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message