cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathias Brökelmann <>
Subject AW: The encoding nightmare with StreamGenerator
Date Sun, 28 Apr 2002 09:07:55 GMT

I think the problem is the servlet engine which parses the parameters
out of the request. StreamGenerator simply takes the parameters from the
request object.

Tomcat will use ISO-8859-1 as character encoding if the browser like ie
or netscape is not sending the character encoding to the server. 
Bad thing: it is hard coded in tomcat so you can not configure the
default encoding. (see: Tomcat sources org.apache.catalina.connector.
RequestBase method getReader())

The only solution which I found is not to send the post as
application/x-www-form-urlencoded but as multipart/form-data.

The result is that you get the content as binary and not already parsed
by the servlet engine. This should also work specially for xml streams
because of the <?xml version="1.0" encoding="UTF-8"?> statement to
identify the encoding.

Anyway, the StreamGenerater seems not to be able to handle
multipart/form-data as ContentType. Why?

Hope that helps.

Mathias Broekelmann

> -----Ursprüngliche Nachricht-----
> Von: Robert Koberg []
> Gesendet: Sonntag, 28. April 2002 00:28
> An:
> Betreff: Re: The encoding nightmare with StreamGenerator
> Hi Stefano.
> Is your xsl:output putting out utf-8 or iso?
> We have the same problem not using cocoon. We use JS to pre-parse for
> these kinds of things - trial and error... :(
> best,
> -Rob
> Stefano Mazzocchi wrote:
> >I have a browser that sends a POST request with:
> >
> >  content-type: application/x-www-form-urlencoded
> >
> >and the hidden field "content" is populated (using client-side
> >javascript) with some xml which looks like this
> >
> >   <?xml version="1.0" encoding="UTF-8"?>
> >   <page>
> >    <title>Title</title>
> >    <abstract>è</abstract>
> >    ...
> >   </page>
> >
> >the weird "è" text is the UTF-8 encoded value for [è] (depending on
> >your mail client you might not be getting nothing of the above as I
> >write it, but that's exactly part of the encoding nightmare that UTF
> >designed to fix... but there is still a long way to go)
> >
> >Now, I have use StreamGenerator to get this text, have it parsed and
> >feed my pipeline. So far so good.
> >
> >The problem is that stupid StreamGenerator doesn't recognize the
> >encoding (because the content-type doesn't have the 'charset:' part
> >defined (and IE can't be tweaked to emit that, AFAIK)) so it spits
> >charachers "as they are" (as they were ASCII encoded) (I used the
> >LogTransformer to witness this and the same weird 'è' appears in the
> >logs with no encoding translating taking place).
> >
> >It seems that StreamGenerator (or the parser instance it
> >fails to see that 'è' is not two 8bits chars but one 16bit char.
> >
> >I'm positive the bug resides on StreamGenerator: in fact, if I tweak
> >javascript to fill the form content with
> >
> >   <?xml version="1.0" encoding="BLAH"?>
> >
> >the parser doesn't even trigger an error.
> >
> >I'm going to investigate how to patch this since I need it badly! but
> >you have any suggestions I'm all ears.
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, email:

To unsubscribe, e-mail:
For additional commands, email:

View raw message