tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: mojk and utf8 charset problem
Date Fri, 29 Apr 2011 15:24:16 GMT
Thierry Templier wrote:
> Hello,
> I developped an application that uses UTF8 encoding since it needs to display arabic
characters. When directly accessing the application from Tomcat, everything works fine. When
I tried to access it through Apache web server and mod jk, I have problems to display such
characters. Utf8 is correctly configured within Apache web server since I can display them
from static pages. So it seems the problem comes from mod jk.
> Is there a way to configure modjk to use utf8 encoding for http requests and responses?

I suggest to get one of the browser add-ons which allow to display the complete HTTP 
response from the webserver to the browser (iow the HTTP headers as well as the content).
For Firefox, you can use for example HttpFox; for IE, there is Fiddler2. A quick search in

Google will lead you to the download page.

Install one of those, re-do your server request, and carefully compare what you get back
a) from Tomcat directly
b) from Apache + mod_jk + tomcat

The way that a browser will display a page (in terms of charset) depends on 3 elements :

1) when the server sends a response, it includes a "Content-type" HTTP header, which in 
this case should be something like :
Content-type: text/html; charset=UTF-8

2) any <meta> tags included inside the <head> portion of the html page.
For example, a tag such as :
<meta http-equiv="content-type" value="text/html; charset=UTF-8" />

3) the way in which the browser (each specific browser, and sometimes even version) 
interprets the above.

According to the HTTP RFCs, the browser SHOULD NOT "second-guess" what the server says in

terms of content-type. In other words, if the server says
Content-type: something; charset=somecharset
then the browser should blindly follow this, and not make its own determination.

However, IE for one is notorious for not following this aspect of the RFCs, and constantly

trying to determine by itself what it is receiving, often in contradiction to what the 
server says. And worse, the determination it makes depends on the version of IE, and 
sometimes even on the patches applied to ir or to Windows.

3a) ultimately, it is the user who is in control.  In the browser settings, there is a way

to override the above, and force the browser to always display the page in a specific 
character set.  It does not sound that this is an issue in your case, but better check anyway.

But first, make sure that what you are receiving in one case or the other is really the 
same, headers and content.
And maybe also try it with different browsers, to see if the result is always the same.

Once you know the answer to that, then you can start looking for the issue in a more 
focused way.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message