tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Hardy <>
Subject Re: charset problems coming up during runtime
Date Wed, 05 Nov 2003 17:17:00 GMT
On 11/05/2003 09:31 AM Christoph Lechleitner wrote:
> I have a really weird problem with charset handling concerning special
> characters like German "umlaute" (i.e. ä, ö, ü) (it also concerns 
> characters from French and so on).
> I have done extensive Google and list searches, but all information I 
> found handles installations that are unable to handle special characters 
> at all, but my problem is a bit different:
> When Tomcat (4.1.27) starts, my applications handle umlauts absolute 
> correctly (i.e., an ä read from a file or a database is encoded correctly 
> as &aamp; by my encoding methods). 
> But, after some (mostly long) runtime, this changes and an ä is suddenly 
> dedected as "something completely different", forcing my methods to
> replace it with a ? or a space.
> Unfortunately, as the problem does never occur on a freshly started tomcat,
> it is impossible to reproduce it reliable ;-<<
> My observation and research results so far: 
> - The problem occurs before my encoding loop can do it's work, i.e.
>   an 'ä' in a String to be parsed does not match a constcant char 'ä'
>   any more, or in other words ...
>   somestring.charAt(someIndex) == 'ä'
>   is false althoug the character is an 'ä'.
>   This observation does also mean that no output filtering functionality
>   (which AFAIK I do not use) can be the "evil".
> - As it happens with strings read from files as well as with strings
>   read from mysql databases, it seems to be a Tomcat or JRE(?) problem.
> - As the problem does not exist with a "freshly" started Tomcat, the
>   general environment (language settings and so on) seem to be correct.
> - In most cases, the problem starts after days or even weeks without
>   a tomcat restart, but sometimes it occurs only minutes or a few hours
>   after tomcat's start.
> - It happens with several Sun JDKs from 1.3.1 up to all majar 1.4.x
>   releases, i.e, 1.4.0, 1.4.1, 1.4.2.
> The software versions used are:
> - Tomcat 4.1.x (currently I am using 4.1.27)
> - SuSE Linux 8.1, 8.2, 9.0, kernels 2.4.*, optimized for Athlon family.
> - I am not using any template-engine or filter-functions of tomcat
>   (as far as I understand it ;->>)
> - System, filesystems, and all applications set to use ANSI respectively
>   ISO-8859-1 / ISO-8859-15, which share the same codes at least for all
>   legacy charachters and German Umlauts.
> I am not sure if I should blame the JRE or SuSE or the compilers (jikes!?)
> perhaps (instead of stealing your time), but if my problem is caused by 
> some kind of bug or perhaps by an undetected feature in either of these 
> software, this list is, by far, my best hope to find other victims ;;-))
> Any Ideas?

Hi Christoph
with a difficult-to-reproduce bug like this, you have to narrow down the 
problem area more.

Basically you are saying the problem is with reading these characters 
from files and databases. Is that correct? When the problem occurs, does 
tomcat carry on handling incoming characters correctly? e.g. saving them 
to the DB or file correctly? I mean, how does the problem manifest 
itself? In the browser, or the database or the logs or other files?

Which character set are you using? iso-8859-1 or iso-8859-15? You say 
you have your OS, your Java, appserver and database all set to use one 
of these, if I understand correctly. Presumably consistently the one or 
the other and not a mix?

Is this app in production? What sort of load is it handling? I ask to 
see what the feasibility of changing the appserver is - try the 90 day 
trial of weblogic for instance. Does that suffer the problem too?

What about changing to IBM Java? Or even from Linux to Windows?


struts 1.1 + tomcat 5.0.12 + java 1.4.2
Linux 2.4.20 RH9

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message