cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruno Dumon <>
Subject Re: Short Introduction to using Cocoon with non-roman languages - was: Has anyone used Cocoon for chinese language application ?
Date Sat, 29 May 2004 10:02:14 GMT
On Fri, 2004-05-28 at 22:18, Jasper Michalczik wrote:
> Dear Reinhard, dear Cocoon-users,
> I was asked to give a short explanation on how to use Cocoon for
> non-roman languages - especially Arabic - which should be of use for
> Chinese as well.
> I'm not too firm in using Cocoon, so please feel free to correct or
> extend this.
> All files have to be saved as utf-8, so make sure to add/change the
> first line of your xml/xsl-files:
> 	<?xml version="1.0" encoding="UTF-8"?>

This isn't a requirement, it can be any encoding you like as long as it
supports the characters you need. It can be a different encoding then
the one being used to send the page to the browser. UTF-8 is a good
choice though.

> In sitemap.xmap I added the following to each serializer:
> 	<map:serializer logger=...>
> 		<encoding>UTF-8</encoding>
> 	</map:serializer>
> This adds the following META-Tag to the serialized document:
> 	<META http-equiv="Content-Type" content="text/html;
> charset=UTF-8">

yep, but it only does it if your page has already a html/head tag in it.

> Then I set the following parameters in web.xml...
> 	<init-param>
> 		<param-name>container-encoding</param-name>
> 		<param-value>ISO-8859-1</param-value>
> 	</init-param>
> 	<init-param>
> 		<param-name>form-encoding</param-name>
> 		<param-value>UTF-8</param-value>
> 	</init-param>
> ... to make sure the forms are processed correctly.
> On the client side at least Windows 2000 (I don't know about Linux or
> Mac) must be used with the keyboard settings set up to allow
> Arabic/Chinese typing. If you only need to display non-roman characters,
> this also works with any system and a browser that supports
> Unicode-display. IE5+ for example downloads the necessary fonts
> automatically when needed.
> I remember having some troubles using Tomcat 4.1.29, but 4.1.18 works
> fine.

This is because of the following issue:

>  I don't have any experiences with any other version or
> servlet-container.
> I only can't explain why the container-encoding in web.xml has to be set
> to ISO-8859-1. If anybody knows about this, please add it to this text.
> Any other setting I tried to use didn't work out.

It has to be ISO-8859-1, always. This is because the servlet
specification requires that request parameters are by default decoded as
ISO-8859-1 (regardless of the default platform encoding). The only
reason I can imagine this is configurable at all is to work around buggy
servlet containers.

More background on all this is also available at:

> I hope I could make a small contribution to the growing
> cocoon-community...


> Jasper Michalczik

Bruno Dumon                   
Outerthought - Open Source, Java & XML Competence Support Center                

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message