From Marc Portier <>
Subject Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.
Date Sat, 01 Nov 2003 22:10:13 GMT
Reinhard Poetz wrote:

> The parameter CONTEXT_DEFAULT_ENCODING is set in - how
> can I override this value?

you don't:
it's value IS NOT the encoding, it's value is just the lookup-key inside 
the context to read the DEFAULT_ENCODING

as for the remaining question 'where do I set the value then?'
there currently is a servlet init-param one can set inside the web.xml 
which is called 'form-encoding'

the whole reasoning build up in this thread has been to
1/ use that same setting as the default for our 
text-oriented-serializers (ie anything below AbstractTextSerializer in 
the inheritance chain) in order to avoid as much as possible the 
possible inconsistency we are facing now.

2/ implement this by adding that setting to the Context and letting the 
AbstractTextSerializer be Contextualizable

>>personally I think this patch should come together with a 
>>change to our 
>>web.xml so we rather change the default form-encoding to be 
>>also "utf-8"
> sorry, I don't understand this. Does this mean the general encoding is
> iso-8859-1 and the form encoding is UTF-8? If yes, why two different
> encodings?

by now Joerg and Bruno have been adding enough to the thread to see that 
there is more then just two encodings in this world, and quite 
interestingly: they can all be different :-)

I understand that this can become easily confusing, and that is the main 
reason I didn't want to expand the discussion to any other encodings 
then the ones at hand here.

So as a recap:

Given the fact that todays browser behaviour is coupling
1. the encoding of the HTML-stream (from server to browser) describing 
the <form>
2. the encoding used to encode the request params in the HTTP-request 
hosting the form-submit (from browser to server),

the web-app-developer is kind of forced into doing a decent effort in 
making sure on the server-side he is decoding the request-params with 
the same encoding as was used to serialize the HTML with.

The above observation made me label our current default-settings for 
both encodings inside Cocoon to be 'inconsistent':
- if you don't specify an encoding for the serializer (sitemap.xmap) 
it's utf-8
- if you don't specify an encoding for the form-decoding (web.xml) then 
it is iso-8859-1

To fix this I'ld like to:
use the context as described above to communicate the chosen (or 
implicit) form-decoding to the AbstractTextSerializer so it can use that 
as a natural default-encoding (currently there is no such thing as a 
default-encoding for the AbstractTextSerializer resulting in it being 
chosen by xalan)

as a consequence however this would mean that the default-encoding for 
the serializers changes from utf-8 to iso-8859-1

we could take the other path and let the fix go together with changing 
the form-decoding to utf-8

The remaining question being: Which path do people prefer? Are there 
clear argumentations to rule out one or the other? do we vote?

PS: I do hope this clears out the confusion?
Marc Portier                  
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at                        

