tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre-John Mas <>
Subject Re: UTF-8 POST request results in corrupted data
Date Tue, 07 Oct 2008 03:39:49 GMT
Thanks for the answer on this point. Reading section 3.7.1 of RFC 2616  
indicates that request can specify a character other than the default.  
For this reason the following should technically be legal:

<form action="" method="post" enctype="application/x-www-form- 
urlencoded; charset=utf-8" accept-charset="utf-8">

What I see, from testing on my Mac, is that Firefox and Safari fail to  
pass the charset attribute, but Opera does. What I do notice here is  
that even though Opera does specify the character set, Tomcat ignores  
it replacing the submitted Japanese characters by question
marks. This is an indication that UTF-8 was accepted but it was  
converted to ISO-8859-1 and no equivalent mapping was available. With  
Firefox and Safari I get the same behaviour when I specify:


Basically I am not getting the Japanese characters as typed in the  
form. There is a problem here.


On 6-Oct-08, at 22:22 , William A. Rowe, Jr. wrote:

> Andre-John Mas wrote:
>> Just to repeat what I stated in the ticket:
>> The problem I have with the suggested approach is that it treats  
>> UTF-8
>> as an
>> exception, rather that a norm for my whole application server. I am  
>> not
>> sure
>> that I should be having to be specifying the encoding before  
>> handling every
>> request. For a web site that is completely in UTF-8 that is a lot of
>> duplicated
>> code.
> Because of rfc 2616 3.7.1;
>   The "charset" parameter is used with some media types to define the
>   character set (section 3.4) of the data. When no explicit charset
>   parameter is provided by the sender, media subtypes of the "text"
>   type are defined to have a default charset value of "ISO-8859-1"  
> when
>   received via HTTP. Data in character sets other than "ISO-8859-1" or
>   its subsets MUST be labeled with an appropriate charset value. See
>   section 3.4.1 for compatibility problems.
>> Also, I ask the question why should we allow one behaviour for the  
>> URI
>> in the
>> container and not allow for the same with regards to the POST?
> because the same does not apply, it's not a specific encoding.
> Header fields are 8859-1 per section 2.2, but URI's aren't defined
> as *TEXT.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message