cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bazeley, John" <>
Subject RE: Stream Generator / uploading UTF-8 encoded chinese files
Date Fri, 08 Jul 2005 16:11:38 GMT
> Hi,
> you can configure the encoding like this :
> Did you configure the <form-encoding> in web.xml ?
> Did you try using the action :  setCharacterEncoding (at the start of 
> you pipeline) ?
> Did you open your document with Ultraedit to see what's the encoding ?
> Lionel
> Bazeley, John wrote:
> >Hi all,
> >
> >I'm trying to use the stream generator to upload XML files that 
> >are UTF-8 encoded and contain chinese characters. Source system
> >is Windows XP and Cocoon is v2.1.7 running on Solaris 9 / Java
> >1.4.2. Whether I use my own pipeline with curl uploading the file
> >or the /samples/stream/process-order pipeline, the results are 
> >the same: the file is returned to me with all the chinese 
> >characters mangled ('od' shows all the Chinese characters have 
> >been converted to 357 277 275).
> >
> >I have inserted debug into the stream generator and the XML 
> >serialiser, and both think they are using UTF-8 encoding. 
> >
> >Why is my document getting corrupted? What am I doing wrong?
> >
> >The source document has 'encoding="UTF-8"' in the <?xml ... string, 
> >and IE and Firefox both display it correctly and tell me the 
> encoding 
> >is UTF-8, so I am inclined to believe the document is correctly 
> >encoded.
> >
> >All suggestions are welcome.
> >
> >Thanks, John

Some more information for the record that I did not post earlier:

I'm using the version of Jetty that comes bundled with Cocoon 2.1.7 as
the servlet container.
Debug has ascertained that the uploaded file gets saved to disk 
correctly, so the corruption happens some time after that.

I have updated the servlet jar to 2.3, and that did not make things
any better.

My minimal pipeline is:

    <map:match pattern="john/text">
      <map:generate type="stream">
        <map:parameter name="generate-attributes" value="true"/>
        <map:parameter name="form-name" value="my_xmlfile"/>
      <map:serialize type="text"/>

and as I stated earlier, the corruption occurs using the sample uploader

In my sitemap, I have the text serialiser set to utf-8 thus:
      <map:serializer logger="sitemap.serializer.text" 
        mime-type="text/plain" name="text" pool-max="20" 

Thanks for any help,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message