cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bram Bouwens <bram.bouw...@fredhopper.com>
Subject Orion converts the XML passed to Cocoon into ISO-8859-1 where this is not wanted
Date Tue, 05 Nov 2002 09:00:06 GMT
Versions: orion 1.5.2/1.6.0, cocoon 2.0.3, jdk 1.3.1_06, RedHat 7.3.

We have a web application that used to produce HTML from the JSP pages
in the UTF-8 encoding, so there are no problems with most languages.

Now we split the functional part from the visual design by having the
JSP pages produce XML, and using Cocoon to render this into HTML.

The sitemap.xmap contains this:

...
            <map:match pattern="demo/**.fh">
             <map:generate type="jsp" src="/xmlout/{1}.jsp">
               <map:parameter name="use-request-parameters" value="true"/>
             </map:generate>
             <map:transform src="layout/demo/{1}.xsl"/>
             <map:serialize type="html"/>
            </map:match>
...

and cocoon.xconf contains
...
   <jsp-engine logger="core.jsp-engine">
     <parameter name="servlet-class" 
value="com.evermind.server.http.JSPServlet"/>
     <parameter name="servlet-name" value="*.jsp"/>
   </jsp-engine>
...

When I request the entry page in its XML-form /xmlout/index.jsp with the
browser (any browser) it all looks fine. It starts with

<?xml version="1.0" encoding="UTF-8"?>

and somewhere it has `België' (Belgium in Dutch) where the ë is encoded
as hex c3 ab, the correct UTF-8 encoding.

Characters like that are garbled when I look at /demo/index.fh .

I added debug printout to 
org/apache/cocoon/components/jsp/JSPEngineImpl.java,
with a class MyPrintWriter extends PrintWriter as the writer for
MyServletOutputStream. I forced the following traceback:

     at 
org.apache.cocoon.components.jsp.JSPEngineImpl$MyPrintWriter.write(JSPEngineImpl.java:342)
     at 
org.apache.cocoon.components.jsp.JSPEngineImpl$MyServletOutputStream.write(JSPEngineImpl.java:322)
     at java.io.OutputStream.write(OutputStream.java:97)
     at com.evermind.server.http.EvermindJSPWriter._vr(Unknown Source)
     at com.evermind.server.http.EvermindJSPWriter.flush(Unknown Source)
     at com.evermind.server.http.EvermindJSPWriter.close(Unknown Source)
     at 
__jspPage6_xmlout_index_jsp._jspService(__jspPage6_xmlout_index_jsp.java:2145)
     at com.orionserver.http.OrionHttpJspPage.service(Unknown Source)
     at com.evermind._ah._rad(Unknown Source)
     at com.evermind.server.http.JSPServlet.service(Unknown Source)
     at 
org.apache.cocoon.components.jsp.JSPEngineImpl.executeJSP(JSPEngineImpl.java:134)


The actual output is produced by calling void write(int b) from
MyServletOutputStream for each character, with the ISO-8859-1 encoding
of the character, sign extended, as a parameter: the ë mentioned above,
which is 235 in ISO-8859-1, is sent as -21 . This appears quite silly
and inefficient to me.

Most likely the issue is resolved simply by putting something in some
config file, as would be very evident when looking at the source of
com.evermind.server.http.EvermindJSPWriter. But unfortunately we don't have
that.

So the obvious question is: how do we fix this?

Bram Bouwens @ Fredhopper.com


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <cocoon-users-unsubscribe@xml.apache.org>
For additional commands, e-mail:   <cocoon-users-help@xml.apache.org>


Mime
View raw message