cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ". ." <>
Subject System upgrade and now Cocoon is escaping tabs/entities.
Date Tue, 28 Sep 2010 14:09:08 GMT


We've come across a really annoying problem since a server upgrade.

We have an web application based on Cocoon 2.1.6 and Tomcat 5.0.x which has been working fine
for years. Recently we have been having some problems with the physical hardware in our servers
so decided to migrate to virtual servers and upgrade some bits and pieces along the way.

Our original application components were:

NetBSD 3.0.3 with Suse 9.x Linux compatibility layer.
Sun JDK 1.4.26
Tomcat 5.0.23
Cocoon 2.1.6

As part of the upgrade we switched to:

Centos 5.3
Sun JDK 1.6.21
Tomcat 5.0.30
Cocoon 2.1.6

We retained all the original configs and Jars/files for Cocoon and things are running well
except for two problems.

Firstly, if any of our source XML/XSL files use tabs to indent the nodes, the outputted source
escapes them as &#A9; which it didn't do before. This isn't a problem for output to be
displayed in a browser but we have a number of legacy Flash components which, annoyingly,
don't recognise this as whitespace and refuses to load causing the Flash component to fail.

Secondly we have a version of our site using Cyrillic characters and this was sadly developed
not using UTF-8 (I don't know why). We're using some butchered hack to use the windows-1251
character set. What we are getting now is the error:

"org.xml.sax.SAXException: Attempt to output character of integral value 
1057 that is not represented in specified output encoding of 

I have a theory that the two problems are related and we're keen to try and get the system
working the way it was. If we can solve the whitespace/tab escaping that's 80% of the battle

The nearest info I've found to the tab escaping problem said to check what XML serializer
we're using and it's "org.apache.cocoon.serialization.XMLSerializer" as defined in sitemap.xmap
which seems to be the preferred version.

At this point I'm stumped as to what part of our "upgrade" would of caused our output to suddenly
start escaping whitespace.

Any ideas?

- J

View raw message