cocoon-docs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stev...@outerthought.org
Subject [WIKI-UPDATE] RequestParameterEncoding BrunoDumon Thu Mar 13 19:00:02 2003
Date Thu, 13 Mar 2003 18:00:03 GMT
Page: http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding , version: 1 on Thu
Mar 13 17:13:46 2003 by 157.193.121.51

New page created:
+ !!!Request parameter encoding
+ 
+ !!Basics
+ 
+ If your Cocoon application needs to read request parameters that could contain "special"
characters, i.e. characters outside of the first 128 ASCII characters, you'll need to pay
attention to what encoding is used.
+ 
+ Normally a browser will send data to the server using the same encoding as the page containing
the submitted form (or whatever). So if the pages are serialized using UTF-8, the browser
will submit form data using UTF-8. The user can change the encoding, but it's quite safe to
assume he/she won't do that (have you ever done it?).
+ 
+ After doing some tests with popular browser's, I've noticed that usually browsers will not
let the server know what encoding they used to encode the parameters, so we need to make sure
ourselves that the encoding used when serializing pages corresponds to the encoding used when
decoding request parameters.
+ 
+ First of all, check in the sitemap what encoding is used when serializing HTML pages:
+ 
+ {{{
+ <map:serializer logger="sitemap.serializer.html" mime-type="text/html"
+        name="html" pool-grow="4" pool-max="32" pool-min="4"
+        src="org.apache.cocoon.serialization.HTMLSerializer">
+   <buffer-size>1024</buffer-size>
+   <encoding>UTF-8</encoding>
+ </map:serializer>
+ }}}
+ 
+ In the example above, UTF-8 is the encoding used. This is a widely supported Unicode encoding,
so it is often a good choice.
+ 
+ The HTML serializer will automatically insert a <meta> tag into the HTML page's HEAD
element specifying the encoding. Most browsers apparently require this. The HTML serializer
will however only do this if your page already
+ contains a HEAD (or head) element, so make sure it has one. The <meta> element inserted
by the serializer will then look as follows:
+ 
+ {{{
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ }}}
+ 
+ By default, if the browser doesn't explicitely mention the encoding, a servlet container
will decode request parameters using the ISO-8859-1 encoding (independent of the platform
on which the container is running). So in the above case where UTF-8 was used when serializing,
we would be facing problems.
+ 
+ The encoding to use when decoding request parameters can be configured in the web.xml by
supplying init parameters called "form-encoding" and "container-encoding" to the Cocoon servlet.
The container-encoding parameter indicates according to what encoding the container tried
to decode the request parameters (normally ISO-8859-1), and the form-encoding parameter indicates
the actual encoding. Here's an example of how to specify the parameters in the web.xml:
+ 
+ {{{
+ <init-param>
+   <param-name>container-encoding</param-name>
+   <param-value>ISO-8859-1</param-value>
+ </init-param>
+ <init-param>
+   <param-name>form-encoding</param-name>
+   <param-value>UTF-8</param-value>
+ </init-param>
+ }}}
+ 
+ For Java-insiders: what Cocoon actually does internally is apply the following trick to
get a parameter correctly decoded: suppose "value" is a string containing a request parameter,
then Cocoon will do:
+ 
+ {{{
+ value = new String(value.getBytes("ISO-8859-1"), "UTF-8");
+ }}}
+ 
+ So it recodes the incorrectly decoded string back to bytes and decodes it using the correct
encoding.
+ 
+ !!Locally overriding the form-encoding
+ 
+ Cocoon is ideally suited for publishing to different kinds of devices, and it may well be
possible that for certain devices, it is required to use different encodings.  In this case,
you can redefine the form-encoding for specific pipelines using the SetCharacterEncodingAction.
+ 
+ To use it, first of all make sure the action is declared in the map:actions element of the
sitemap:
+ {{{
+ <map:action name="set-encoding" src="org.apache.cocoon.acting.SetCharacterEncodingAction"/>
+ }}}
+ 
+ and then call the action at the required location as follows:
+ {{{
+ <map:act type="set-encoding">
+   <map:parameter name="form-encoding" value="some-other-encoding"/>
+ </map:act>
+ }}}
+ 
+ !!Problems with components using the original HttpServletRequest (JSPGenerator, ...)
+ 
+ Some components such as the JSPGenerator use the original HttpServletRequest object, instead
of the Cocoon Request object. In that case, the correct decoding of request parameters will
not happen (that is, if for example the JSP page itself would read request parameters).
+ 
+ One possible solution would be to patch these components to use a wrapper class that delegates
all calls to the HttpServletRequest object, except for the getParameter or getParameterValues
methods, which should be delegated to Cocoon's Request object.
+ 
+ There's an easier solution that can be applied right away if your servlet container supports
the Servlet 2.3 specification. Starting from 2.3, the Servlet specification allows to explicitely
set the encoding to be used for decoding request parameters, though this has to happen before
the first request data is read. Since Cocoon reads request parameters itself (such as cocoon-reload),
this would require modification of the CocoonServlet. But it can also be done using a servlet
filter.  Tomcat 4 contains just such a filter in its "examples" webapp. Look for the file
jakarta-tomcat/webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java. Compile
it (with servlet.jar in the classpath), put it in a jar (using correct package and such) and
put the jar in your webapps WEB-INF/lib directory.
+ 
+ Now modify your webapp's web.xml file to include the following (after the display-name and
description elements, but before the servlet element):
+ 
+ {{{
+ <filter>
+   <filter-name>Set Character Encoding</filter-name>
+   <filter-class>filters.SetCharacterEncodingFilter</filter-class>
+   <init-param>
+     <param-name>encoding</param-name>
+     <param-value>UTF-8</param-value>
+   </init-param>
+ </filter>
+ 
+ <filter-mapping>
+   <filter-name>Set Character Encoding</filter-name>
+   <url-pattern>/*</url-pattern>
+ </filter-mapping>
+ }}}
+ 
+ Since the filter element is new in the servlet 2.3 specification, you might need to modify
the DOCTYPE declaration in the web.xml:
+ 
+ {{{
+ <!DOCTYPE web-app
+     PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
+     "http://java.sun.com/dtd/web-app_2_3.dtd">
+ }}}
+ 
+ Of course, when using a servlet filter to set the encoding, you should not supply the form-encoding
init parameter anymore in the web.xml. You could still supply the container-encoding parameter,
though its value will now have to be the same as the encoding supplied to the filter. This
will allow you to override the form-encoding using the SetCharacterEncodingAction, though
only for the Cocoon Request object.
+ 
+ Using a servlet filter also has the advantage that it will work for any servlet.  Suppose
your webapp consists of multiple servlets, with Cocoon being only one of them.  Sometimes
the processing could start in another servlet (which sets the character encoding correctly)
and then be forwarded to Cocoon, while other times the processing could start immediately
in the Cocoon servlet. It would then be impossible to know in Cocoon whether the request parameter
encoding needs to be corrected or not.
+ 


Page: http://wiki.cocoondev.org/Wiki.jsp?page=BrunoDumon , version: 2 on Thu Mar 13 17:17:08
2003 by 157.193.121.51

- * [ImplementingTransformers]
+ * [DevelopingComponents] and [ImplementingTransformers]
+ * [RequestParameterEncoding]



Mime
View raw message