tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Tomcat5.0.28 character encodingg problem
Date Wed, 25 Jul 2007 16:09:06 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Joe,

Joe Russo wrote:
> I am in the process of converting from using JRUN to Tomcat

Good for you! Welcome to the community.

> I have
> ran into the problem where these funky symbols are displaying.  I can
> not find any stack traces that would explain or possibly clue into a
> solution.  

Right. These things (encoding problems) hardly ever generate errors;
they just exhibit unexpected behavior.

> My questions are:  
> Does Tomcat have problems with any types of encoding?      

Yes and no. Tomcat behaves exactly as the HTTP specification mandates.
That is, it interprets all incoming data using the ISO-8859-1 character
encoding unless the request states otherwise (in the Content-Type
header). Some browsers don't send the encoding along with the
Content-Type, so the behavior gets confused.

Some browsers only send an encoding when there is POST data, since the
Content-Type only really makes sense when where is request content (the
POST data). Unfortunately, the browser usually uses (what would have
been) the Content-Type of a request to encode the URL in the request.
So, if a browser uses UTF-8 to encode the URL (which is typical these
days), but doesn't send a Content-Type header (or leaves out the
encoding), then Tomcat interprets it incorrectly as ISO-8859-1, and you
get funny characters.

It's not Tomcat's fault. It's actually not the browser's fault, either.
It's actually the HTTP spec's fault, since the character encoding used
in URLs isn't explicitly laid out. :(

> What type of characters are being displayed below and any advice in
> troubleshooting or solving this would be gratefully appreciated.

The presence of the 'รข' character looks to me like a UTF-8 URL being
interpreted as an ISO-8859-1 URL. Try searching google for
CharacterEncodingFilter and take a look at that. It tries to recover
from requests that don't include a character encoding. You should also
look at the "URIEncoding" attribute of the <Connector> element. You can
set the encoding to something other than the default (ISO-8859-1).

For more information, see:

http://tomcat.apache.org/faq/misc.html#tomcat5CharEncoding
http://tomcat.apache.org/faq/connectors.html#utf8
http://tomcat.apache.org/tomcat-5.0-doc/config/ajp.html (if you use JK)
http://tomcat.apache.org/tomcat-5.0-doc/config/http.html (if you don't)

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGp3Wi9CaO5/Lv0PARAm9kAJ0Sb2P15mo+x5IUQZBiP1laJKCI3gCdFcO3
W0t6lz0jMzyvRsPK3BTBaXE=
=uAOC
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message