cocoon-docs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cocoon Wiki] Update of "RequestParameterEncoding" by AlexanderKlimetschek
Date Thu, 10 May 2007 10:58:53 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cocoon Wiki" for change notification.

The following page has been changed by AlexanderKlimetschek:
http://wiki.apache.org/cocoon/RequestParameterEncoding

------------------------------------------------------------------------------
  = Request parameter encoding =
  
+ == How-to set everything to UTF-8 with Cocoon and CForms (with Ajax and Dojo) ==
+ 
+ The best for internationalization is to handle everything in UTF-8, since this is probably
the most intelligent encoding available out there. Everything means server side (Backend,
XML), HTTP Requests/Responses and client side with forms and dojo.io.bind.
+ 
+ === 1. Sending all pages in UTF-8 ===
+ 
+ You need to configure Cocoon's serializers to UTF-8. The XML serializer ({{{<serialize
type="xml" />}}}) and the HTML serializer ({{{<serialize type="html" />}}}) need
to be configured. To support all browsers, you must state the encoding to be used for the
body and also include a meta tag in the html: {{{<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">}}}. This is very important, since the browser will then send form requests
encoded in UTF-8 (and browsers normaly don't mention the encoding in the request, so you have
to assume they are doing it right). Here is the configuration for the serializer components
for your sitemaps that will do that:
+ 
+ {{{
+ <serializer name="xml" mime-type="text/xml"
+   src="org.apache.cocoon.serialization.XMLSerializer">
+   <encoding>UTF-8</encoding>
+ </serializer>
+ 
+ <serializer name="html" mime-type="text/html; charset=UTF-8"
+   src="org.apache.cocoon.serialization.HTMLSerializer">
+   <encoding>UTF-8</encoding>
+ 
+   <!-- the following common doctype is only included for completeness, it has no impact
on encoding -->
+   <doctype-public>-//W3C//DTD HTML 4.01 Transitional//EN</doctype-public>
+   <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system>
+ </serializer>
+ }}}
+ 
+ === 2. AJAX Requests with CForms/Dojo ===
+ 
+ If you use CForms with ajax enabled, Cocoon will make use of dojo.io.bind() under the hood,
which creates XMLHttpRequests that POST the form data to the server. Here Dojo decides the
encoding by default, which does not match the browser's behaviour of using the charset defined
in the META tag. But you can easily tell Dojo which formatting to use for all dojo.io.bind()
calls, just include that in the top of your HTML pages, before dojo.js is included:
+ 
+ {{{
+ <script>djConfig = { bindEncoding: "utf-8" };</script>
+ }}}
+ 
+ You might already have other djConfig options, then simply add the {{{bindEncoding}}} property
to the hash map.
+ 
+ === 3. Decoding incoming requests: Servlet Container ===
+ 
+ When the browser sends stuff to your server, eg. form data, the {{{ServletRequest}}} will
be created by your servlet container, which needs to decode the parameters correctly into
Java Strings. If there is the encoding specified in the HTTP request header, he will use that,
but unfortunately this is typically not the case. When the browser sends a form post, he will
only say {{{application/x-www-form-urlencoded}}} in the header. So you have to assume the
encoding here, and the right thing to assume is the encoding of the page you originally sent
to the browser.
+ 
+ The servlet standard says that the default encoding for incoming requests should be ISO-8859-1
(Jetty is not according to the standard here, it assumes UTF-8 by default). So to make sure
UTF-8 is used for the parameter decoding, you have to tell your servlet that encoding explicitly.
This is done by calling {{{ServletRequest.setCharacterEncoding()}}}. To do that for all your
requests, you can use a servlet filter like this one: SetCharacterEncodingFilter.
+ 
+ Then you add the filter to the web.xml:
+ 
+ {{{
+ <filter>
+   <filter-name>Set Character Encoding</filter-name>
+   <filter-class>filters.SetCharacterEncodingFilter</filter-class>
+   <init-param>
+     <param-name>encoding</param-name>
+     <param-value>UTF-8</param-value>
+   </init-param>
+ </filter>
+ 
+ <!-- either mapping to URL pattern -->
+ 
+ <filter-mapping>
+   <filter-name>Set Character Encoding</filter-name>
+   <url-pattern>/*</url-pattern>
+ </filter-mapping>
+ 
+ <!-- or mapping to your Cocoon servlet (the servlet-name might be different) -->
+ 
+ <filter-mapping>
+   <filter-name>SetCharacterEncoding</filter-name>
+   <servlet-name>CocoonBlocksDispatcherServlet</servlet-name>
+ </filter-mapping>
+ 
+ }}}
+ 
+ Since the filter element was added in the servlet 2.3 specification, you need at least 2.3
in your web.xml, but using the current 2.4 version is better, it's the standard for Cocoon
webapplications. For 2.4 you use a XSD schema:
+ 
+ {{{
+ <web-app version="2.4"
+          xmlns="http://java.sun.com/xml/ns/j2ee"
+          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+          xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd">
+ }}}
+ 
+ For 2.3 you need to modify the DOCTYPE declaration in the web.xml:
+ 
+ {{{
+ <!DOCTYPE web-app
+     PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
+     "http://java.sun.com/dtd/web-app_2_3.dtd">
+ }}}
+ 
+ === 4. Setting Cocoon's encoding (especially CForms) ===
+ 
+ To tell Cocoon to use UTF-8 internally, you have to set 2 properties:
+ 
+ {{{
+ org.apache.cocoon.containerencoding=utf-8
+ org.apache.cocoon.formencoding=utf-8
+ }}}
+ 
+ They need to be in some {{{*.properties}}} file under {{{META-INF/cocoon/properties}}} in
one of your blocks.
+ 
+ === 5. XML Files ===
+ 
+ This is normally not a problem, since the standard encoding for XML files is UTF-8. However,
they should always start with the following instruction, which should force your XML Editor
to save them in UTF-8 (it looks like most of them do that, so there should not be a problem
here).
+ 
+ {{{
+ <?xml version="1.0" encoding="UTF-8"?>
+ }}}
+ 
+ === 6. Special Transformers ===
+ 
+ The standard XSLT Transformers and others are working on SAX events, which are not serialized,
thus encoding is not a problem. But there are some special transformers that pass stuff on
to another library that does include serialization and might need a hint to use the correct
encoding. One problem is for example the NekoHTMLTransformer: https://issues.apache.org/jira/browse/COCOON-2063.
+ 
+ If you think there might be a transformer doing things wrong in your pipeline, add a {{{TeeTransformer}}}
between each step, outputting the XML between the transformers into temp1.xml, temp2.xml and
so on to look for the place where your umlaute and special characters are messed up.
+ 
+ === 7. Your own XML serializing Sources ===
+ 
+ If you have your own Source implementation that needs to serialize XML, make sure it will
do that in UTF-8 as well. A good idea is to use Cocoon's XML serializer, since we already
configured that one to UTF-8 above. Sample code that does that is here: ["UseCocoonXMLSerializerCode"]
+ 
+ 
+ == Older documentation ==
+ 
- == Basics ==
+ === Basics ===
  
  If your Cocoon application needs to read request parameters that could contain ''special''
characters, i.e. characters outside of the first 128 ASCII characters, you'll need to pay
attention to what encoding is used.
  

Mime
View raw message