tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans Bergsten <h...@gefionsoftware.com>
Subject Re: charset used for parameters decoding on HTTP request Tomcat3.x,4
Date Wed, 14 Feb 2001 19:47:17 GMT
> Adalbert Wysocki wrote:
> 
> > > You will still need to fix the actual parameter parsing routine to delay
> > > applying the encoding until the name and parameter are parsed out of the
> > > input stream...
> >
> > Yes, most of this is already done. It also has a very nice performance
> > implication - since the String is converted and alocated only when and if
> > it's needed.
> >
> > The only missing part is the "internationalization" module that detects
> > the encoding ( charset and accept-language parsing doesn't look good
> > either in the current code ), and putting the pieces togheter.
> 
> The problem is that browsers do not send the charset used to encode the form's
> parameters; but they sent the request with the ContentType header
> application/x-www-form-urlencoded. The charset should follow the encoding type
> ex: "application/x-www-form-urlencoded; charset=UTF8" but in most of cases
> does not.

Right.

> From my point of view instead of implementing a routine in charge of analysing
> the request header to extract the data's encoding charset (few chances for it
> to really work), It would be better to adopt the following policy:
> 
>  * we suppose that the request's parameters encoding is the one used for the
> response to this request content encoding. If the servlet processing generates
> a result page encoded with Shift_JIS charset, it is reasonnable to suppose
> that the incoming form data used for the page generation is encoded with the
> Shift_JIS charset.
> 
>  * While the parameters decoding, instead of suppose that one url's encoded
> entity (%XX) is a caracter to be decoded, we append all characters as bytes
> and then we decode the full parameter string using the encoding set on the
> response
> (javax.servlet.http.HttpServletResponse.setCharacterEncoding(String)).
> 
>  * The response encoding must be set on the response object before the first
> call to one of following function (then parameters are parsed):
> 
>     - javax.servlet.http.HttpServletRequest.getParameter(String)
>     - javax.servlet.http.HttpServletRequest.getParameterNames()
>     - javax.servlet.http.HttpServletRequest.getParameterValues(String)
> 
>    If the charset was not set on the response object when one of the functions
> listed above is called then parameters are decoded using the default JVM's
> encoding.

I'm afraid I have to -1 this proposal. Sure, it may be a nice feature but it's
not defined by Servlet 2.2. And, for better or for worse, TC 3.x is the
Reference 
Implementation for Servlet 2.2. If we add this behavior to TC 3.x, a servlet
that takes advantage of it will not be portable to other spec compliant 2.2
containers.

Servlet 2.3 defines how to deal with this, and this proposal is not in line
with what's in Servlet 2.3 PFD. It would be bad idea to add a solution in
the RI for 2.2 that's not compatible with the speced behavior for 2.3.

> NB: This policy is used in Caucho's Resin servlet engine and it works fine.
>     Modifications in Tomcat code are basic and the risk to impact the core
> processing is weak

Container vendors are free to add features, even though it's probably not a 
good idea for them to add features that breaks spec compliance ;-)

Hans
-- 
Hans Bergsten		hans@gefionsoftware.com
Gefion Software		http://www.gefionsoftware.com
Author of JavaServer Pages (O'Reilly), http://TheJSPBook.com

Mime
View raw message