cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruno Dumon <>
Subject Re: [heads up] cocoon's defaults form-encoding and seerialize-encoding are inconsistent.
Date Sat, 01 Nov 2003 12:24:45 GMT
On Sat, 2003-11-01 at 12:58, Joerg Heinicke wrote:
> Now I'm confused ...
> With the container encoding all resources are read, i.e. my text files 
> and the request.

Nope, these are two different encodings:

* text files are read according to whatever encoding/locale is
configured in your OS (unless you supply special parameters when
starting the JVM)

* request parameters are always decoded using ISO-8859-1

See also section 4.9 in the servlet 2.3 spec:

-- begin quote
Currently, many browsers do not send a char encoding qualifier with the
Content- Type header, leaving open the determination of the character
encoding for reading HTTP requests. The default encoding of a request
the container uses to create the request reader and parse POST data must
be  ISO-8859-1 , if none has been specified by the client request.
However, in order to indicate to the developer in this case the failure
of the client to send a character encoding, the container returns null
from the getCharacterEncoding method. If the client hasn t set character
encoding and the request data is encoded with a different encoding than
the default as described above, breakage can occur. To remedy this
situation, a new method setCharacterEncoding(String enc) has been added
to the ServletRequest interface. Developers can override the character
encoding supplied by the container by calling this method. It must be
called prior to parsing any post data or reading any input from the
request. Calling this method once data has been read will not affect the
-- end quote

Since the mentioned setCharacterEncoding isn't supported since long (and
must be called before any request parameter is read), Cocoon has its own
mechanism to fix this, which does something like:

new String(value.getBytes(container_encoding), form_encoding);

container_encoding should always be ISO-8859-1 (unless you have a broken
servlet container), and form_encoding should be the same one as on your

>  The form encoding only recodes the request parameters 
> to the expected (i.e. container) encoding. So it works like a servlet 
> filter.
> Joerg
> On 01.11.2003 12:36, Bruno Dumon wrote:
> > On Sat, 2003-11-01 at 12:24, Joerg Heinicke wrote:
> > 
> >>On 01.11.2003 12:08, Reinhard Poetz wrote:
> >>
> >>
> >>>>personally I think this patch should come together with a 
> >>>>change to our 
> >>>>web.xml so we rather change the default form-encoding to be 
> >>>>also "utf-8"
> >>>
> >>>
> >>>sorry, I don't understand this. Does this mean the general encoding is
> >>>iso-8859-1 and the form encoding is UTF-8? If yes, why two different
> >>>encodings?
> >>
> >>These are two different things.
> >>
> >>On the one hand there is the container encoding. It defines with which 
> >>encoding textfiles are read, e.g. properties files. It's about servlet 
> >>container <=> file system.
> >>
> > 
> > 
> > The "container encoding" mentioned here is the encoding with which the
> > servlet container decoded request parameters. The servlet spec says that
> > this should always be ISO-8859-1 (unless the client specified another
> > encoding or, from 2.3, request.setCharacterEncoding is used). This
> > parameter has nothing to do with the encoding used to decode e.g. text
> > files, and should normally always be left to ISO-8859-1.
> > 
> > Some more info about all this can be found on this wiki page:
> >
> > 
> > 
> >>On the other hand there is the form encoding. It defines with which 
> >>encoding requests are read. It's about servlet container <=> clients.
> >>
> >>I hope it's correct so.
Bruno Dumon                   
Outerthought - Open Source, Java & XML Competence Support Center                

View raw message