cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harald Wehr <hw...@hs-harz.de>
Subject URLEncoding of special characters
Date Thu, 22 Apr 2004 06:35:01 GMT
I have a problem concerning special german characters occuring in urls. 
I made a minimal example to show my problems. Assume following pipeline 
snippet:

<map:match pattern="SpecialCharacters.html">
   <map:generate type="file" src="context://content/test1.xml"/>
   <map:serialize type="html"/>
</map:match>

The test1.xml looks like this. Please consider the special german 
characters in the url (hope the are displayed correctly in your mail 
client):

<?xml version="1.0" encoding="iso-8859-1" ?>
<html>
    <head>
      <title>Test</title>
    </head>
    <body>
       <a href="ÜTest.html">ÜTest</a>
       <a href="ÄTest.html">ÄTest</a>
    </body>
</html>

The HTML-Serializer encodes the urls to following output (source code of 
HTML file):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 
Transitional//EN""http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Test</title>
</head>
<body>
<a href="%C3%9CTest.html">&Uuml;Test</a>
<a href="%C3%84Test.html">&Auml;Test</a>
</body>
</html>

So the Ü is encoded to %C3%9C and Ä to %C3%84 but I need %DC for Ü and 
%C4 for Ä.

The java.net.URLEncoder.encode method brings the following:

System.out.print(java.net.URLEncoder.encode("ÜÄ","UTF-8"));
Result: %C3%9C%C3%84

System.out.print(java.net.URLEncoder.encode("ÜÄ","ISO-8859-1"));
Result: %DC%C4

So why does the serializer does this UTF-8 url encoding? In the web.xml 
I set the container-encoding and form-encoding parameters to ISO-8859-1 
without any changes. Serializer is the defined the following way in the 
sitemap:

<map:serializer logger="sitemap.serializer.html" mime-type="text/html" 

     name="html" pool-grow="4" pool-max="32" pool-min="4"
     src="org.apache.cocoon.serialization.HTMLSerializer">
  <doctype-public>-//W3C//DTD HTML 4.01 Transitional//EN</doctype-public>
  <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system>
  <encoding>ISO-8859-1</encoding>
</map:serializer>

Can you give me any hints how I get the url correctly encoded? (need it 
for further database lookups).

Cocoon: Dev-Snapshot from 2004-03-29
Java: 1.4.2_03

Thanks for your help

Harald

-- 
Institut für Tourismus- und Geo-Informationssysteme GmbH
Sitz: Friedrichstrasse 57-59 38855 Wernigerode

Büro: Gießerweg 5
       38855 Wernigerode            Web:     http://www.itgis.com
                                    Tel:     03943/557807
                                    Fax:     03943/557808

Das Internet-Lexikon - Ein Dienst der ITGIS GmbH:
http://www.knowlex.org

Privat: http://www.harald-wehr.de



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Mime
View raw message