cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jens Lorenz" <>
Subject Re: encoding problem with xslt
Date Fri, 12 Jul 2002 14:05:50 GMT
----- Original Message -----
From: "thorsten schmid" <>
To: <>
Cc: <>
Sent: Friday, July 12, 2002 2:17 PM
Subject: encoding problem with xslt

Hi Thorsten,


> ================================================================
> output:
> <a
> Integrations&auml;mter
> </a>               .
> ================================================================

This output is certainly correct. URIs generated via HTML output
method of Xalan or Saxon are UTF-8 encoded. (and äöü are 2 bytes wide
when using UTF-8) This is recommended by RFC 2718.

The problem is not Cocoon, but the Servlet-Spec. Tomcats default
encoding is ISO-8859-1. So your URI ist decoded with ISO-8859-1.
This obviously breaks your Cocoon servlet later.
Since HTTP protocol does not send encoding with the URI, there is
also no chance for Tomcat to detect the encoding of the URI. And
even worse Request.setEncoding() affects only parameters (GET request,
POST is immune, since during a POST the encoding is send by the browser)

You have three options:

Set encoding of your servlet container to UTF-8. For Tomcat you do this
by setting CATALINA_OPTS to "-Dfile.encoding=UTF-8". But beware, that
this might break your existing plain text files, which are most probably
ISO-8859-1 encoded. With XML files this is no problem, as long as you
specify their encoding correctly.

Second option is to manually recode the URI within Cocoon via some
custom code. But this is somewhat "hacky".

Third option and probably best, is to use a Servlet filter in front
of Cocoon which does the transformation of character encodings for
you. This way, you don't have to break text files read and written
by the Tomcat JVM, and you can still use full UTF-8 within your URIs.

If anyone has some more ideas on this topic (non-ISO-8859-1 characters
within URIs), I would greatly appreciate some more input.
Conclusion for me is to avoid such characters in URIs. But this does
not get easily into the heads of our customers and users. (e.g. file

Best Regards,



jens.lorenz at interface-projects dot de

interface:projects GmbH                             \\|//
Tolkewitzer Strasse 49                              (o o)
01277 Dresden                               ~~~~oOOo~(_)~oOOo~~~~

Please check that your question  has not already been answered in the
FAQ before posting.     <>

To unsubscribe, e-mail:     <>
For additional commands, e-mail:   <>

View raw message