tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andoni" <>
Subject How to UTF-8 your site.
Date Tue, 10 Jun 2003 11:26:33 GMT

I have recently completed the torturous process of translating my web-site into 16 European
languages.  Having had lots of advice from this list and other sources I have come down to
a few conclusions about what a Java / Tomcat web-site needs in order to fully support UTF-8.

These are:

JSP pages must inlcude the header:

<%@ page
 contentType="text/html; charset=UTF-8"

In the Catalina.bat (windows) (windows) apache$ (OpenVMS), file
there must be a switch added to the call to java.exe.  The switch is:


I cannot find documentation for this environment variable anywhere or what it actually does
but it is essential.

For translation of inputs coming back from the browser there must be a method that translates
from the browser's ISO-8859-1 to UTF-8.  It seems to me that -1 is used in all regions as
I have had people in countries such as Greece & Bulgaria test this and they always send
input back in -1 encoding.  The method which you will use constantly should go something like

  * Convert ISO8859-1 format string (which is the default sent by IE
  * to the UTF-8 format that the database is in.
 public String toUTF8(String isoString)
  String utf8String = null;
  if (null != isoString && !isoString.equals(""))
    byte[] stringBytesISO = isoString.getBytes("ISO-8859-1");
    utf8String = new String(stringBytesISO, "UTF-8");
   catch(UnsupportedEncodingException e)
    // As we can't translate just send back the best guess.
    System.out.println("UnsupportedEncodingException is: " + e.getMessage());
    utf8String = isoString;
   utf8String = isoString;
  return utf8String;

I have found that these three steps are all that is necessary to make your site accept any
language that UTF-8 can work with.  I extend my thanks to those of you on the Tomcat users
list who helped me find these little gems.

Kind regards,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message