tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: Tomcat 6 encoding issue
Date Thu, 12 Nov 2009 10:25:08 GMT
Konstantin Kolinko wrote:
> 2009/11/12 pramodpm <>:
>> We are  getting following error:
>> Not an ISO 8859-1 character: <EF><BF><83>.
>> It is not just <83>. Sorry I missed those last time.
>> We are working with java6. If I use tomcat 5.5.23 it is working... But we
>> would like to use the tomcat 6.
> Those 5.5 and 6.0 are probably running on different computers, with
> different locale settings in their OSes.
> There are places in programs, where byte -> character conversion
> occurs. In all those places you should explicitly specify, what
> encoding those bytes are using.
> If you do not specify the encoding explicitly (if you are lazy or do
> not know how to do it), you will end up with platform default
> encoding, and that will be different in different locales.
What Konstantin writes above is true. In addition :

If you were running Tomcat 6 on the same machine as Tomcat 5.5, and with 
exactly the same environment, and retrieving the same external page, 
then the error (or absence of error) should be the same under Tomcat 5.5 
and Tomcat 6, because the java servlet classes that you are using are 
the same.  So, obviously, something is different here between your 
Tomcat 5.5 and your Tomcat 6 (apart from Tomcat).

The key here, is that you have, inside of your application, some Unicode 
string, containing some characters that are valid in Unicode. But then, 
you try to output them to the ServletOutputStream, which is set for 
ISO-8859-1.  Which means that Java must do a character set conversion, 
from the internal Unicode, to the external ISO-8859-1 output stream.
And that is when it complains, because internally the string contains a 
Unicode character (which in UTF-8 looks like the sequence <EF><BF><83>),

and that character does not have a valid representation in ISO-8859-1.

So you must either change your ServletOutputStream to be also UTF-8 (and 
make sure you set everything in accordance to that), or else you must 
filter the output characters before passing them to the output stream, 
and anything that is not ISO-8859-1, you must take out, or replace by a 
placeholder characters (like "?") for example.
How that all fits with your application, we cannot tell.

There is no quick-and-dirty solution to this kind of thing, and no 
single Tomcat or Java setting that will solve the problem.
You are dealing with multilingual data at the input, so you need to 
handle that properly.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message