tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cmanola...@yahoo.com
Subject RE: charset used for parameters decoding on HTTP request Tomcat3. x,4
Date Wed, 14 Feb 2001 16:01:59 GMT
> 
> The problem is that browsers do not send the charset used to encode the
> form's parameters; but they sent the request with the ContentType header
> application/x-www-form-urlencoded. The charset should follow the encoding
> type ex: "application/x-www-form-urlencoded; charset=UTF8" but in most of
> cases does not.

I know. But that's the standard, and we have to follow it first.
If that fails ( and will - in most browsers that ignore the standards ) -
then we can try workarounds. 


> >From my point of view instead of implementing a routine in charge of
> analysing the request header to extract the data's encoding charset (few
> chances for it to really work), It would be better to adopt the following
> policy:

There is no "instead" here - in addition of the ";charset=" we can do
many things.


>  * we suppose that the request's parameters encoding is the one used for the
> response to this request content encoding. If the servlet processing
> generates a result page encoded with Shift_JIS charset, it is reasonnable to
> suppose that the incoming form data used for the page generation is encoded
> with the Shift_JIS charset.
>...
> (javax.servlet.http.HttpServletResponse.setCharacterEncoding(String)).
>...

That's a good idea - thanks Adalbert. 

There are other few tricks we can try ( in addition to this one ), and in
time we can hope that browsers will follow the standards.

BTW, another small improvement would be to specify an encoding per
application ( instead of defaulting to the platform or UTF).
And one may guess the charset from the Accept-Language ( in some cases ).
A very common mechanism seems to be a "charset" parameter in the request (
it seems there it is possible to do a javascript trick in the page to add
a hidden param with the current browser encoding ).

I'll start working on that in 1-2 weeks, and any sugestion ( like this
one ) will help.

Costin


Mime
View raw message