tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nch <underscore_...@yahoo.com>
Subject Re: Character encoding
Date Wed, 18 Jun 2008 20:49:46 GMT

Chris, thanks for your help.
Please, see my comments bellow.
Kind regards.



----- Original Message ----
From: Christopher Schultz <chris@christopherschultz.net>
To: Tomcat Users List <users@tomcat.apache.org>
Sent: Wednesday, June 18, 2008 9:42:21 PM
Subject: Re: Character encoding

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

nch,

nch wrote:
| | - I do remote debugging through Eclipse to both tomcat on windows
| | (same machine as eclipse, though) and tomcat on debian.

| Okay, remote debugging should not affect the server, but I'm still
| wondering if the server.xml you think you are using is the one actually
| being used. Try setting the <Connector> port to something crazy like
| 12345 and restarting. If you can still contact the server, then you are
| either editing the wrong server.xml (there should only be one!) or your
| changes are not being picked up.

I'll try.

| | - When I send "piraña" it is always encoded into the URL as
| | "pira%C3%B1a", whether running tomcat on windows, debian or even
| | running my app into Jetty.

| That's because your browser is encoding it, not the server. So, it
| doesn't depend on the server configuration (except possibly for the page
| encoding, which often directs the browser to use utf-8 URI encoding).

But, if the URL is allways encoded in the same way and tomcat does not receive any other information
on what the resulting character encoding should be. Why do I get different values from tomcat?

| | - If I type "piraña" on http://www.us-webmasters.com/Decode-URLs/ and
| | switch browser encoding display between ISO-8859-1 and UTF-8, I can
| | see that when ISO-8859-1, then it displays "piraña", when UTF-8, it
| | displays "piraña".

| I'm not sure what you think you're doing, there. When I paste that word
| into the box to decode, I get broken output. There is no indication as
| to what encoding the server expects for URIs.

| Switching browser interpretation of the resulting page does not seem to
| prove anything. The server never advertises any encoding to use, so the
| browser just chooses whatever it wants. My browser chooses ISO-8859-1.
| When I switch it to UTF-8, I see the expected interpretation. I'm not
| sure what I just learned.

If we take a look into this page src code we can see the following line:
 <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
I assume the this site expects ISO-8859-1 from the browser and so it decodes it into ISO-8859-1.
In the case of "Piraña" it decodes to "piraña" which is same as what tomcat gives to my
controller, even though I'm explicitly telling it to decode to UTF-8.

| | - Something could be wrong in my debian environment. How can I find
| | out about which env. varables is tomcat using?

| Tomcat does not use any environment variables. The only settings that
| affect the interpretation of the URI are the "URIEncoding" and
| "useBody..." settings on the <Connector>. Are you using more than one
| connector? Are you using Apache httpd out in front of Tomcat?

Ah, I forgot to mention. I do have an apache httpd in front of tomcat, but for testing purposes
I'm directly accessing tomcat through port 8080. Anyway, it yields same results whether directly
accessing tomcat or through httpd.
So, if tomcat doesn't read env. variables, why would debian packagers try to set LANG to system
default into their tomcat init script? Does that make sense?
BTW, the instance of tomcat I'm running on debian was manually downloaded from tomcat.apache.org

| | - If I try to manually decode the returned parameter into my
| | controller by using URLDecoder.decode(query, "UTF-8") then I can see
| | no difference. That is, when debugging the tomcat on windows the
| | result is "piraña" while debugging the one on debian the result is
| | "piraña".

| So, running this:

| URLDecoder.decode(URLEncoder.encode("piraña", "UTF-8"), "UTF-8");
|
| ...gives you "piraña" on your debian system? That doesn't seem right.

I realise this test is crap :-) because I'm passing URLEncoder.encode an already decoded parameter.
I'm tired ...
I'll try to get the "raw" url parameter.

| | - Is URLDecoder#decode environment dependent?

| Nope. As long as you always provide the encoding to bs used, you should
| be fine.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEUEARECAAYFAkhZZR0ACgkQ9CaO5/Lv0PCbTQCgm/eWN4Xphx9GQ4CTPZXNXdvn
rigAlA5l2731npViTS8ofT4cqSi5F6o=
=g6gT
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message