tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nch <underscore_...@yahoo.com>
Subject Re: Character encoding
Date Wed, 18 Jun 2008 20:56:58 GMT
You say:
Tomcat does not use any environment variables. The only settings that
affect the interpretation of the URI are the "URIEncoding" and
"useBody..." settings on the <Connector>. Are you using more than one
connector? Are you using Apache httpd out in front of Tomcat?

Perhaps the JVM does and so tomcat read them indirectly through it??

Cheers



----- Original Message ----
From: Christopher Schultz <chris@christopherschultz.net>
To: Tomcat Users List <users@tomcat.apache.org>
Sent: Wednesday, June 18, 2008 9:42:21 PM
Subject: Re: Character encoding

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

nch,

nch wrote:
| - I do remote debugging through Eclipse to both tomcat on windows
| (same machine as eclipse, though) and tomcat on debian.

Okay, remote debugging should not affect the server, but I'm still
wondering if the server.xml you think you are using is the one actually
being used. Try setting the <Connector> port to something crazy like
12345 and restarting. If you can still contact the server, then you are
either editing the wrong server.xml (there should only be one!) or your
changes are not being picked up.

| - When I send "piraña" it is always encoded into the URL as
| "pira%C3%B1a", whether running tomcat on windows, debian or even
| running my app into Jetty.

That's because your browser is encoding it, not the server. So, it
doesn't depend on the server configuration (except possibly for the page
encoding, which often directs the browser to use utf-8 URI encoding).

| - If I type "piraña" on http://www.us-webmasters.com/Decode-URLs/ and
| switch browser encoding display between ISO-8859-1 and UTF-8, I can
| see that when ISO-8859-1, then it displays "piraña", when UTF-8, it
| displays "piraña".

I'm not sure what you think you're doing, there. When I paste that word
into the box to decode, I get broken output. There is no indication as
to what encoding the server expects for URIs.

Switching browser interpretation of the resulting page does not seem to
prove anything. The server never advertises any encoding to use, so the
browser just chooses whatever it wants. My browser chooses ISO-8859-1.
When I switch it to UTF-8, I see the expected interpretation. I'm not
sure what I just learned.

| - Something could be wrong in my debian environment. How can I find
| out about which env. varables is tomcat using?

Tomcat does not use any environment variables. The only settings that
affect the interpretation of the URI are the "URIEncoding" and
"useBody..." settings on the <Connector>. Are you using more than one
connector? Are you using Apache httpd out in front of Tomcat?

| - If I try to manually decode the returned parameter into my
| controller by using URLDecoder.decode(query, "UTF-8") then I can see
| no difference. That is, when debugging the tomcat on windows the
| result is "piraña" while debugging the one on debian the result is
| "piraña".

So, running this:

URLDecoder.decode(URLEncoder.encode("piraña", "UTF-8"), "UTF-8");

...gives you "piraña" on your debian system? That doesn't seem right.

| - Is URLDecoder#decode environment dependent?

Nope. As long as you always provide the encoding to bs used, you should
be fine.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEUEARECAAYFAkhZZR0ACgkQ9CaO5/Lv0PCbTQCgm/eWN4Xphx9GQ4CTPZXNXdvn
rigAlA5l2731npViTS8ofT4cqSi5F6o=
=g6gT
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message