tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: URIEncoding
Date Fri, 16 Dec 2011 20:37:11 GMT
starz10de wrote:
> I have an application which is running in local machine and it work perfect.
> I installed my application in the server to make it available for all. In
> the server we have tomcat running and provide services for many instances.
> After I played my application in the server, I had problem with query which
> have special language character. After long time, I could find where is the
> problem. The problem was in server.xml where the URIEncoding is set to
> "UTF-8". I made test and just removed this line or set it to "ISO-8859-1"
> and all was perfect. My question here is it possible to set the URIEncoding
> for each instance or is it possible to set it some where else. I send the
> query from jsp page to the servlet. in my jsp page the charset=ISO-8859-1".
> I tried to make all utf-8 but I couldn't success. I tried the filter
> approach but also doesn't help: 
> <filter> 
> <filter-name>Set Character Encoding</filter-name> 
> <filter-class>servlet.CharsetFilter</filter-class> 
> <init-param> 
> <param-name>encoding</param-name> 
> <param-value>ISO-8859-1</param-value> 
> </init-param> 
> </filter> 
> <!-- Define filter mappings for the defined filters --> 
> <filter-mapping> 
> <filter-name>Set Character Encoding</filter-name> 
> <servlet-name>action</servlet-name> 
> </filter-mapping> 
> Any hint will be appreciated. 


1) By default, under HTTP (and HTML), the character set is ISO-8859-1.
So, if you do not specify anything anywhere to say something else, everything should be 
understood and processed as ISO-8859-1.

2) When a browser submits the contents of a <form> to a server, it will /generally/
the same character set, as the one which /it thinks/ is the character set of the *current*

page (the one which is currently shown on the screen == the one which contains the link or

button which will send data to the server).

So, what you need to do, is to look in the browser in the "Page info" or similar, which 
character set the browser believes is in effect for the current page.

3) Normally also, this character set will be the one which, in the page source, is 
indicated by the following tag :
<meta http-equiv="content-type" content="text/html; charset=XXXXX" />
(it is the XXXXX above)
So make sure that all the pages that you send to the browser contain such a tag, with the

correct character set.

4) Thus, if your pages are UTF-8, then any link in the page which "calls" the server, is 
going to send all values to the server in the UTF-8 character set.
That includes the "query-string" part of URLs, and also the POST parameters which may be sent.
If that is the case, you need to tell the server that it is so, because that is /not/ the

default for HTTP.
So that is when you should use the "URIencoding" parameter : if your forms are sending 
requests to the server containing a query-string.

5) if your forms are sending values by means of POST requests, then the situation gets 
more complicated, if you use a character set other than ISO-8859-1.
But let's leave that for the next time.

A question maybe, for later : what is/are the (human) language(s) that are used on your 
pages ?

(*) I also /strongly/ advise, for issues of that nature, that you get a browser plug-in 
such as HttpFox or similar (for Firefox) or Fiddler2 (for Internet Explorer), to be able 
to check exactly what is being sent from the browser to the server and vice-versa.

(**) Unfortunately, in Java the internal representation for characters and strings is 
Unicode, which can lead to mixups if you are not careful.

Or, let me turn this around : it is much better to use Unicode as a character set, than 
any other "alphabet".  But unfortunately, in the WWW, for historical reasons, the default

is still ISO-8859-1, which creates many problems when one tries to deal with non-English 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message