tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <>
Subject Re: URIEncoding
Date Thu, 26 Jul 2007 21:04:40 GMT
Hash: SHA1


Frederic Bastian wrote:
> I'm sorry but I think you don't get it :) Reading and writing URI is
> totally different from writing the response output.

I'd agree that reading a URI is different, but not writing one. Where
are you writing your URI? Into the response, I'm guessing. In fact, I'm
guessing you're writing it into the response /body/, which ought to be
encoded using the response's declared Content-Type (in the HTTP header).
The encoding used for reading the URI from the request is irrelevant, here.

> For instance, you
> can set the response character encoding to UTF-8 in order to display
> your html in UTF-8, and set the Connector URIEncoding to ISO-8859-1 to
> read URI in ISO-8859-1 (and so, you have to encode your URI in ISO-8859-1).

Yes, except that most browsers will use the encoding of the previous
response to encode the URI (unless you have "use UTF-8 URLs" turned on
in the options -- most browsers have this feature, and I think it's
turned on by default these days).

> For instance, If you want to make a redirection, you just send a
> redirection header, there is no response output writing, so no matter
> wich character encoding your web pages are displayed in.

Now we're getting somewhere. You didn't mention that you were talking
about a redirection URI, which will go into a header. The interesting
part now is that HTTP headers do not have a declared character encoding.
Most browsers use UTF-8 for URI encoding, but the headers use ASCII from
what I can tell from the spec.

So... how do you decide which character encoding to use for the URI? You
have to guess. It's stupid, but true. The browser will not tell you the
encoding it uses. Forcing your Connector to use ISO-8859-1 or UTF-8 is
just a guess, too. Using your own code to override the default for the
Connector is just adding confusion to a process already fraught with

What makes you think that the Connector has the right answer in the
first place?

> The point is that the character encoding of the <Connector> URIEncoding,
> and the character encoding of the URLEncoder method, have to be consistent.

I believe this to be true only under the following conditions:

1. You are writing a URI to be used in an HTTP header.
2. The URIEncoding used by your Connector was correct in the first

The only way to tell if the encoding was right in the first place is to
encode parameters whose values you /know/ and then check them on the
other end to see if the browser really was using UTF-8 or ISO-8859-1 (or

> Make the try : set the response character encoding to UTF-8, set the
> URLEncoder character encoding to UTF-8, generate a web page including
> links with encoded parameters with special chars, and follow these
> links. You will see that the server does not interpret correctly the
> parameters, because the <Connector> URIEncoding is still set to ISO-8859-1.

If you are setting the URIEncoding of the Connector to UTF-8 and it's
not interpreting it as UTF-8, then Tomcat has a bug. Since you are the
only one experiencing this phenomenon, I'm guessing it's not a bug.

If you have everything set to UTF-8 (as I do in my production apps), you
should not have this problem.

> So, for portability purpose, I'd like to make the character encoding of
> the <Connector> and of the URLEncoder consistent, without modifying the
> server.xml file. But it looks pretty impossible :p

I disagree that the Connector knows any better than you do about how to
encode outgoing URLs. The browser is going to do whatever the heck it
wants, and it's not going to tell you what it did. You just have to guess.

- -chris

Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla -


To start a new topic, e-mail:
To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message