cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Portier <...@outerthought.org>
Subject Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.
Date Fri, 31 Oct 2003 16:18:34 GMT


Sylvain Wallez wrote:

> Marc Portier wrote:
> 
>> Hi all,
>>
>> we seem to have a smaal inconsistency concerning encoding of HTML forms
>>
>> - our HTML serializer by default is using the UTF-8 encoding.
>> (in fact it's set nowhere in the system and is thus left over to xalan 
>> which most likely is going down the easy path of assuming the default 
>> from XML land?)
>>
>> - not setting the form-encoding parameter in cocoon's web.xml defaults 
>> to assuming the browsers are sending the request params in the 
>> ISO-8859-1 encoding (CocoonServlet.java line 500)
> 
> 
> 
> I encountered this problem and discovered that browsers (at least IE6 & 
> Mozilla) send form content using the encoding of the HTML page. But the 
> problem is that no header tells the server about the used encoding.
> 

indeed, this is a known issue, see for instance the servlet 2.3 spec
section SRV 4.9 Request Data Encoding

cocoon has inside even a mechanism to survive the issue on 2.2 instalations

> What is the supposed way of writing portable applications that 
> automagically find the correct encoding?
> 

the supposed way is that you consider that the URI contract 
communication is not only about the uri and the allowed 
request-parameters but also the expected way those request params are 
encoded!

so you expect the end-users of your application to be setting the 
encoding in their browser according to that contract :-)

in practice this means that
1/ the one generating the html form makes sure he applies that very 
encoding on the way out
2/ we all expect that the browser will do a correct auto-detection and 
the end-user doesn't (know about how to) change that encoding manually 
before submitting the form

the awkward thing is that the HTTP spec has room for letting the browser 
communicate what was used as encoding (and the servlet 2.3 
implementation should take that into account) BUT NONE OF THE BROWSERS 
DO IT.



sigh, it is the same kind of historic 'wrong' as

- wrong implementations of 302 relocates (http 1.1 introduced 307 to 
allow room for the correct implementation of what http 1.0 intended 302 
to be)
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html (see note inside 
10.3.3)

- the wrong spelling of referrer in 'http_referer' (should have been two 
r's )
http://www.google.com/search?q=http_referer+spelling&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8




so, welcome to the web:
we create specs so fast that we can't be bothered with the spelling! (or 
the correct implementation)



Wobbly me doesn't mind that much about the folkloristic spelling part ;-)

-marc=
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0116284/
mpo@outerthought.org                              mpo@apache.org


Mime
View raw message