tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nch <underscore_...@yahoo.com>
Subject Re: Character encoding
Date Wed, 18 Jun 2008 19:09:39 GMT

More info on this:

- I do remote debugging through Eclipse to both tomcat on windows (same machine as eclipse,
though) and tomcat on debian.

- I open a debugging port on tomcat by setting CATALINA_OPTS=-Xmx1024m -Xdebug -Xnoagent -Djava.compiler=NONE
-Xrunjdwp:transport=dt_socket,address=4501,server=y,suspend=n

- When I send "piraña" it is allways encoded into the URL as "pira%C3%B1a", whether running
tomcat on windows, debian or even running my app into Jetty.

- When I send "piraña", if I'm debugging tomcat on windows I can read "piraña".

- If tomcat is running on debian, I read "piraña".

- If I type "piraña" on http://www.us-webmasters.com/Decode-URLs/ and switch browser encoding
display between ISO-8859-1 and UTF-8, I can see that when ISO-8859-1, then it displays "piraña",
when UTF-8, it displays "piraña".

- When I run/debug my app on Jetty I get "piraña" (I've read on the web that Jetty decodes
to UTF-8 by default).

- Something could be wrong in my debian environment. How can I find out about which env. varables
is tomcat using?

- If I try to manually decode the returned parameter into my controller
by using URLDecoder.decode(query, "UTF-8") then I can see no
difference. That is, when debugging the tomcat on windows the result is
"piraña" while debugging the one on debian the result is "piraña".

- Is URLDecoder#decode environment dependent?

Hope this is useful. Lots of thanks to you all.



----- Original Message ----
From: Christopher Schultz <chris@christopherschultz.net>
To: Tomcat Users List <users@tomcat.apache.org>
Sent: Wednesday, June 18, 2008 7:25:03 PM
Subject: Re: Character encoding

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

nch,

nch wrote:
| I have a form that has an input field named "query". I type "piraña"
| an submit the form using the GET method. I can see the browser has
| encoded this parameter into the URI as query=pira%C3%B1a

Is this a correct UTF-8 encoding of the parameter? I don't have my
unicode conversion chart handy right now.

| I set a breakpoint

Stop right there. If you are executing TC through a debugger, are you
sure that it is using its standard server.xml configuration?

| into the filter so when the request hits the filter I can see
| getCharacterEncoding() returns null. The filters sets it to "UTF-8".

FYI, this has no bearing on the interpretation of the URI.

| Then the request gets to the controller where I can see the request
| parameter "query" is set to "piraña".

Just in case it doesn't go through email very well, I see "pir" followed
by an A with a tilde over it, followed by a +/- symbol, followed by an
"a". Definitely not right. Is that what you'd expect if you improperly
interpreted the UTF-8, URL-encoded "piraña" as if it were ISO-8859-1?

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkhZRO8ACgkQ9CaO5/Lv0PBXBQCeP3YKqnpJDO65N8lfvO9ThPhr
Nr8AnRbPC1BxIEOXqIOrMCS1ACy7YFU6
=y8/w
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message