cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jos Snellings <Jos.Snelli...@pandora.be>
Subject Re: character encoding of a HttpServletRequest
Date Mon, 11 Jan 2010 10:34:08 GMT
That is right!
It is just a confusing situation :-(
The filter works fine. The init() method of a generator does not give a
chance to call setCharacterEncoding, as the parsing already happened.
The good thing is that the code is already in spring, so, no new
external dependencies. Maybe later on I add a
"tryToGuessEncodingFilter".

Jos

On Mon, 2010-01-11 at 10:49 +0100, Reinhard Pötz wrote:
> Jos Snellings wrote:
> > Hi,
> > 
> > HttpServletRequest looks 'imperfect':
> > Cocoon 3, alpha 2.
> > A generator accesses the HttpServletRequest in the setup method:
> > 
> > request = HttpContextHelper.getRequest(parameters);
> > text = request.getParameter("tekst");
> > 
> > The pages, including forms are ecoded in utf-8.
> > The String 'text' is strange: the original content (utf-8) is encoded
> > once again:
> > if the string on the form was one character, say 'é', the string has a
> > length of 4 bytes. It is the result of utf-8 encoding the two byte
> > character coming from the client. So, a second conversion is happening.
> > 
> > Now:
> > new String(request.getParameter("text").getBytes("ISO-8859-1")) works
> > fine.
> > 
> > Where should this be corrected?
> 
> Jos,
> 
> in Cocoon 3 there isn't any code that changes the encoding of request
> parameters. The plain HttpServletRequest as provided by the servlet
> container is used.
> 
> IIRC Tomcat uses ISO-8859-1 by default which follows the recommendation
> of the Servlet API spec:
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> SRV.4.9 Request data encoding
> Currently, many browsers do not send a char encoding qualifier with the
> Content-Type header, leaving open the determination of the character
> encoding for reading HTTP requests. The default encoding of a request
> the container uses to create the request reader and parse POST data must
> be “ISO-8859-1” if none has been specified by the client request.
> However, in order to indicate to the developer in this case the failure
> of the client to send a character encoding, the container returns null
> from the getCharacterEncoding method.
> If the client hasn’t set character encoding and the request data is
> encoded with a different encoding than the default as described above,
> breakage can occur. To remedy this situation, a new method
> setCharacterEncoding(String enc) has been added to the ServletRequest
> interface. Developers can override the character encoding supplied by
> the container by calling this method. It must be called prior to parsing
> any post data or reading any input from the request. Calling
> this method once data has been read will not affect the encoding.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> So as some others suggested, the best option is using one of the
> CharecterEncoding servlet filters and not to remedy this situation
> somewhere in C3.
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Mime
View raw message