cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathias Brökelmann <math...@mathias.d2g.com>
Subject AW: The encoding nightmare with StreamGenerator
Date Sun, 28 Apr 2002 09:07:55 GMT
Hi,

I think the problem is the servlet engine which parses the parameters
out of the request. StreamGenerator simply takes the parameters from the
request object.

Tomcat will use ISO-8859-1 as character encoding if the browser like ie
or netscape is not sending the character encoding to the server. 
Bad thing: it is hard coded in tomcat so you can not configure the
default encoding. (see: Tomcat sources org.apache.catalina.connector.
RequestBase method getReader())

The only solution which I found is not to send the post as
application/x-www-form-urlencoded but as multipart/form-data.

The result is that you get the content as binary and not already parsed
by the servlet engine. This should also work specially for xml streams
because of the <?xml version="1.0" encoding="UTF-8"?> statement to
identify the encoding.

Anyway, the StreamGenerater seems not to be able to handle
multipart/form-data as ContentType. Why?

Hope that helps.

Mathias Broekelmann

> -----Ursprüngliche Nachricht-----
> Von: Robert Koberg [mailto:rob@koberg.com]
> Gesendet: Sonntag, 28. April 2002 00:28
> An: cocoon-dev@xml.apache.org
> Betreff: Re: The encoding nightmare with StreamGenerator
> 
> Hi Stefano.
> 
> Is your xsl:output putting out utf-8 or iso?
> 
> We have the same problem not using cocoon. We use JS to pre-parse for
> these kinds of things - trial and error... :(
> 
> best,
> -Rob
> 
> 
> Stefano Mazzocchi wrote:
> 
> >I have a browser that sends a POST request with:
> >
> >  content-type: application/x-www-form-urlencoded
> >
> >and the hidden field "content" is populated (using client-side
> >javascript) with some xml which looks like this
> >
> >   <?xml version="1.0" encoding="UTF-8"?>
> >   <page>
> >    <title>Title</title>
> >    <abstract>è</abstract>
> >    ...
> >   </page>
> >
> >the weird "è" text is the UTF-8 encoded value for [è] (depending on
> >your mail client you might not be getting nothing of the above as I
> >write it, but that's exactly part of the encoding nightmare that UTF
was
> >designed to fix... but there is still a long way to go)
> >
> >Now, I have use StreamGenerator to get this text, have it parsed and
> >feed my pipeline. So far so good.
> >
> >The problem is that stupid StreamGenerator doesn't recognize the
> >encoding (because the content-type doesn't have the 'charset:' part
> >defined (and IE can't be tweaked to emit that, AFAIK)) so it spits
the
> >charachers "as they are" (as they were ASCII encoded) (I used the
> >LogTransformer to witness this and the same weird 'è' appears in the
> >logs with no encoding translating taking place).
> >
> >It seems that StreamGenerator (or the parser instance it
instantiates)
> >fails to see that 'è' is not two 8bits chars but one 16bit char.
> >
> >I'm positive the bug resides on StreamGenerator: in fact, if I tweak
the
> >javascript to fill the form content with
> >
> >   <?xml version="1.0" encoding="BLAH"?>
> >
> >the parser doesn't even trigger an error.
> >
> >I'm going to investigate how to patch this since I need it badly! but
if
> >you have any suggestions I'm all ears.
> >
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message