tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Yet again a problem with POSTs and encodings
Date Mon, 03 Dec 2007 15:55:54 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Александър,

Александър Шопов wrote:
> My problem is that I am trying to POST non ASCII data to tomcat, but it
> gets recoded in ISO8859-1 interpretation of UTF-8 byte sequence.

[snip]

> in server.xml I have put:
> URIEncoding="UTF-8" in the conf/server.xml files.

Note that this only affects the character encoding used to interpret the
URL (and GET parameters, but not POST).

> 30089 pts/3    Sl     0:03 /opt/jdk1.5.0_13/bin/java
> -Dfile.encoding=UTF-8

Good to know, but might not be enough. The browser can still send the
wrong character encoding.

> <%@ page language="java" contentType="text/html;charset=UTF-8"%>

Always good to set the charset.

> However - when I change the method to GET or simply do a GET to the
> resource with parameters - everything works fine - Cyrillic gets decoded
> just fine.

This is probably because the request's (body) encoding is either wrong
or unset:

> Content-Type: application/x-www-form-urlencoded

Note that there is no charset being used, here.

The code for Tomcat's HTTP connector (in 5.5.23, which is the source I
have in front of me) delegates the detection of the request's character
encoding to the ContentType.getCharsetFromContentType method, which has
this comment:

    // Basically return everything after ";charset="
    // If no charset specified, use the HTTP default (ASCII) character set.

Actually, the code returns null when there is no character set
(actually, when there is no ';' in the content type).

So, it's time to turn to the servlet spec. Section 4.9 of the 2.4
specification states:

"
Currently, many browsers do not send a char encoding qualifier with the
Content-Type header, leaving open the determination of the character
encoding for reading HTTP requests. The default encoding of a request
the container uses to create the request reader and parse POST data must
be “ISO-8859-1” if none has been specified by the client request.
However, in order to indicate to the developer in this case the failure
of the client to send a character encoding, the container returns null
from the getCharacterEncoding method.
"

So, there's your ISO-8851-1 default.

Most people get around this by using the CharacterEncodingFilter that is
often discussed on this list. I believe that Spring includes an
implementation, but it's pretty easy to write yourself, too: simply
check the character encoding of the request and, if it's null (or
blank), call request.setCharacterEncoding and set it to whatever makes
sense (usually utf-8).

Hope that helps,
- -chris

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHVCcK9CaO5/Lv0PARAqflAJ0UFVTneKOmAZrCvI+yn04Cig5wmwCgmh8e
9Z5NZEerqj+UZSlrZp8xMFA=
=VpBq
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message