cocoon-docs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From da...@cocoon.zones.apache.org
Subject [DAISY] Updated: How to configure consistent encoding in Cocoon
Date Mon, 21 May 2007 21:59:06 GMT
A document has been updated:

http://cocoon.zones.apache.org/daisy/documentation/1366.html

Document ID: 1366
Branch: main
Language: default
Name: How to configure consistent encoding in Cocoon (unchanged)
Document Type: Cocoon Document (unchanged)
Updated on: 5/21/07 9:58:43 PM
Updated by: Reinhard Pötz

A new version has been created, state: draft

Parts
=====

Content
-------
This part has been updated.
Mime type: text/xml (unchanged)
File name:  (unchanged)
Size: 14028 bytes (previous version: 14259 bytes)
Content diff:
    <html>
    <body>
    
--- <p>The best for internationalization, ie. support of umlaute, special
--- characters, non-english languages, is to handle everything in UTF-8, since this
--- is probably the most intelligent encoding available out there. If you need
--- another encoding, simply replace all occurrences of UTF-8 with that one, but
--- note that this guide was only tested with UTF-8, other encodings might not be
--- supported at all places.</p>
+++ <p>The best for internationalization, ie. support of umlauts, special
+++ characters, non-english languages, is to handle everything in UTF-8, because
+++ this is probably the most intelligent encoding available out there.</p>
    
--- <p>The following How-To covers the typical steps to achieve a consistent
--- encoding everywhere. Some <a href="#theory">Background Information</a> can
be
--- found at the end of this page.</p>
+++ <p class="note">If you need another encoding, simply replace all occurrences of
+++ UTF-8 with that one, but note that this guide was only tested with UTF-8, other
+++ encodings might not be supported at all places.</p>
    
--- <h3>1. Sending all pages in UTF-8</h3>
+++ <p>The following how-to covers the typical steps to achieve a consistent
+++ encoding throughout a Cocoon application. Some <a href="#theory">Background
+++ information</a> can be found at the end of this page.</p>
    
+++ <h1>1. Sending all pages in UTF-8</h1>
+++ 
    <p>You need to configure Cocoon's serializers to UTF-8. The XML serializer
    (<tt>&lt;serialize type="xml" /&gt;</tt>) and the HTML serializer
    (<tt>&lt;serialize type="html" /&gt;</tt>) need to be configured.
To support all
(20 equal lines skipped)
    &lt;/serializer&gt;
    </pre>
    
--- <h3>2. AJAX Requests with CForms/Dojo</h3>
+++ <h1>2. AJAX Requests with CForms/Dojo</h1>
    
    <p>If you use CForms with ajax enabled, Cocoon will make use of dojo.io.bind()
    under the hood, which creates XMLHttpRequests that POST the form data to the
(8 equal lines skipped)
    <p>You might already have other djConfig options, then simply add the
    <tt>bindEncoding</tt> property to the hash map.</p>
    
--- <h3>3. Decoding incoming requests: Servlet Container</h3>
+++ <h1>3. Decoding incoming requests: Servlet Container</h1>
    
    <p>When the browser sends stuff to your server, eg. form data, the
    <tt>ServletRequest</tt> will be created by your servlet container, which needs
(60 equal lines skipped)
        "http://java.sun.com/dtd/web-app_2_3.dtd"&gt;
    </pre>
    
--- <h3>4. Setting Cocoon's encoding (especially CForms)</h3>
+++ <h1>4. Setting Cocoon's encoding (especially CForms)</h1>
    
    <p>To tell Cocoon to use UTF-8 internally, you have to set 2 properties:</p>
    
(6 equal lines skipped)
    containerencoding must be the same as the one you specified in the
    SetCharacterEncodingFilter. But here we are using UTF-8 everywhere anyway.</p>
    
--- <h3>5. XML Files</h3>
+++ <h1>5. XML Files</h1>
    
    <p>This is normally not a problem, since the standard encoding for XML files is
    UTF-8. However, they should always start with the following instruction, which
(3 equal lines skipped)
    <pre>&lt;?xml version="1.0" encoding="UTF-8"?&gt;
    </pre>
    
--- <h3>6. Special Transformers</h3>
+++ <h1>6. Special Transformers</h1>
    
    <p>The standard XSLT Transformers and others are working on SAX events, which
    are not serialized, thus encoding is not a problem. But there are some special
    transformers that pass stuff on to another library that does include
    serialization and might need a hint to use the correct encoding. One problem is
    for example the NekoHTMLTransformer:
--- <a href="https://issues.apache.org/jira/browse/COCOON-2063"><img width="11" height="11"
src="http://wiki.apache.org/wiki/modern/img/moin-www.png"/>
--- https://issues.apache.org/jira/browse/COCOON-2063</a>.</p>
+++ <a href="https://issues.apache.org/jira/browse/COCOON-2063">https://issues.apache.org/jira/browse/COCOON-2063</a>.
+++ </p>
    
    <p>If you think there might be a transformer doing things wrong in your
    pipeline, add a <tt>TeeTransformer</tt> between each step, outputting the
XML
    between the transformers into temp1.xml, temp2.xml and so on to look for the
    place where your umlaute and special characters are messed up.</p>
    
--- <h3>7. Your own XML serializing Sources</h3>
+++ <h1>7. Your own XML serializing Sources</h1>
    
    <p>If you have your own Source implementation that needs to serialize XML, make
    sure it will do that in UTF-8 as well. A good idea is to use Cocoon's XML
(2 equal lines skipped)
    <a href="http://wiki.apache.org/cocoon/UseCocoonXMLSerializerCode">UseCocoonXMLSerializerCode</a>
    </p>
    
--- <h2 id="theory">Further information</h2>
+++ <h1>Further information</h1>
    
--- <h3>Browser encoding basics</h3>
+++ <h2>Browser encoding basics</h2>
    
--- <h4>Getting pages</h4>
+++ <h3>Getting pages</h3>
    
    <p>If your Cocoon application needs to read request parameters that could
    contain <em>special</em> characters, i.e. characters outside of the first
128
(20 equal lines skipped)
    HTML serializer if you configure it with the parameters mime-type and encoding,
    as stated above.</p>
    
--- <h4>Sending form data</h4>
+++ <h3>Sending form data</h3>
    
    <p>By default, if the browser doesn't explicitely mention the encoding, a
    servlet container will decode request parameters using the ISO-8859-1 encoding
(15 equal lines skipped)
    impossible to know in Cocoon whether the request parameter encoding needs to be
    corrected or not (see below).</p>
    
--- <h3>Request parameter decoding in Cocoon</h3>
+++ <h2>Request parameter decoding in Cocoon</h2>
    
--- <h4>Fixing a wrong servlet container</h4>
+++ <h3>Fixing a wrong servlet container</h3>
    
    <p>If you are not able to set the default encoding for your servlet container to
    what you actually want, it is possible to configure Cocoon to re-decode
(23 equal lines skipped)
    page itself would read request parameters). The only working solution seems to
    be the servlet-filter here.</p>
    
--- <h4>Locally overriding the form-encoding</h4>
+++ <h3>Locally overriding the form-encoding</h3>
    
    <p>Cocoon is ideally suited for publishing to different kinds of devices, and it
    may well be possible that for certain devices, it is required to use different
(13 equal lines skipped)
    &lt;/map:act&gt;
    </pre>
    
--- <h3>Operating System Preliminaries</h3>
+++ <h2>Operating System Preliminaries</h2>
    
    <p>Not having influence on request parameter decoding, but sometimes making
    trouble with text files, database communication, etc. are operating system
(8 equal lines skipped)
    <a href="http://wiki.apache.org/cocoon/SettingTheJvmLocale">SettingTheJvmLocale</a>.
    </p>
    
--- <h3>More readings</h3>
+++ <h2>More readings</h2>
    
    <ul>
    <li>
    <p>
--- <a href="http://marc.theaimsgroup.com/?t=106760662600010&amp;r=1&amp;w=2"><img
width="11" height="11" src="http://wiki.apache.org/wiki/modern/img/moin-www.png"/>
--- cocoon's defaults form-encoding and seerialize-encoding</a>
+++ <a href="http://marc.theaimsgroup.com/?t=106760662600010&amp;r=1&amp;w=2">Cocoon's
+++ defaults form-encoding and seerialize-encoding</a>
    <a href="http://wiki.apache.org/cocoon/MarcPortier">MarcPortier</a> proposal
to
    remove inconsitencies in the way Cocoon handles the encoding of serialized text
    and request-parameter decoding.
--- <a href="http://marc.theaimsgroup.com/?l=xml-cocoon-dev&amp;m=106772461923197&amp;w=2"><img
width="11" height="11" src="http://wiki.apache.org/wiki/modern/img/moin-www.png"/>
--- This</a> is a good summary of the thread.</p>
+++ <a href="http://marc.theaimsgroup.com/?l=xml-cocoon-dev&amp;m=106772461923197&amp;w=2">This</a>
+++ is a good summary of the thread.</p>
    </li>
    <li>Cocoon does not support the HTTP request header
    <a href="http://www.w3.org/TR/REC-html40/interact/forms.html#adef-accept-charset">Accept-Charset</a>,
--- where the browser specifies a list of encodings he can handle. Maybe this might
+++ where the browser specifies a list of encodings it can handle. Maybe this might
    be useful to implement.</li>
    </ul>
    
(2 equal lines skipped)


Collections
===========
Removed from collection: cdocs-site-main
Added to collection: cdocs-site-22

Mime
View raw message