Return-Path: Delivered-To: apmail-jakarta-tomcat-user-archive@www.apache.org Received: (qmail 70543 invoked from network); 5 Nov 2003 17:18:18 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 5 Nov 2003 17:18:18 -0000 Received: (qmail 44426 invoked by uid 500); 5 Nov 2003 17:17:49 -0000 Delivered-To: apmail-jakarta-tomcat-user-archive@jakarta.apache.org Received: (qmail 44399 invoked by uid 500); 5 Nov 2003 17:17:49 -0000 Mailing-List: contact tomcat-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Tomcat Users List" Reply-To: "Tomcat Users List" Delivered-To: mailing list tomcat-user@jakarta.apache.org Received: (qmail 44386 invoked from network); 5 Nov 2003 17:17:49 -0000 Received: from unknown (HELO moutng.kundenserver.de) (212.227.126.187) by daedalus.apache.org with SMTP; 5 Nov 2003 17:17:49 -0000 Received: from [212.227.126.162] (helo=mrelayng.kundenserver.de) by moutng.kundenserver.de with esmtp (Exim 3.35 #1) id 1AHRHz-0007zw-00 for tomcat-user@jakarta.apache.org; Wed, 05 Nov 2003 18:17:51 +0100 Received: from [217.228.245.244] (helo=cyberspaceroad.com) by mrelayng.kundenserver.de with asmtp (Exim 3.35 #1) id 1AHRHz-0003Pz-00 for tomcat-user@jakarta.apache.org; Wed, 05 Nov 2003 18:17:51 +0100 Message-ID: <3FA9308C.9080805@cyberspaceroad.com> Date: Wed, 05 Nov 2003 18:17:00 +0100 From: Adam Hardy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20031025 X-Accept-Language: en, en-us, de-de MIME-Version: 1.0 To: Tomcat Users List Subject: Re: charset problems coming up during runtime References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On 11/05/2003 09:31 AM Christoph Lechleitner wrote: > I have a really weird problem with charset handling concerning special > characters like German "umlaute" (i.e. �, �, �) (it also concerns > characters from French and so on). > > I have done extensive Google and list searches, but all information I > found handles installations that are unable to handle special characters > at all, but my problem is a bit different: > > When Tomcat (4.1.27) starts, my applications handle umlauts absolute > correctly (i.e., an � read from a file or a database is encoded correctly > as &aamp; by my encoding methods). > > But, after some (mostly long) runtime, this changes and an � is suddenly > dedected as "something completely different", forcing my methods to > replace it with a ? or a space. > > Unfortunately, as the problem does never occur on a freshly started tomcat, > it is impossible to reproduce it reliable ;-<< > > My observation and research results so far: > > - The problem occurs before my encoding loop can do it's work, i.e. > an '�' in a String to be parsed does not match a constcant char '�' > any more, or in other words ... > somestring.charAt(someIndex) == '�' > is false althoug the character is an '�'. > This observation does also mean that no output filtering functionality > (which AFAIK I do not use) can be the "evil". > > - As it happens with strings read from files as well as with strings > read from mysql databases, it seems to be a Tomcat or JRE(?) problem. > > - As the problem does not exist with a "freshly" started Tomcat, the > general environment (language settings and so on) seem to be correct. > > - In most cases, the problem starts after days or even weeks without > a tomcat restart, but sometimes it occurs only minutes or a few hours > after tomcat's start. > > - It happens with several Sun JDKs from 1.3.1 up to all majar 1.4.x > releases, i.e, 1.4.0, 1.4.1, 1.4.2. > > The software versions used are: > - Tomcat 4.1.x (currently I am using 4.1.27) > - SuSE Linux 8.1, 8.2, 9.0, kernels 2.4.*, optimized for Athlon family. > - I am not using any template-engine or filter-functions of tomcat > (as far as I understand it ;->>) > - System, filesystems, and all applications set to use ANSI respectively > ISO-8859-1 / ISO-8859-15, which share the same codes at least for all > legacy charachters and German Umlauts. > > I am not sure if I should blame the JRE or SuSE or the compilers (jikes!?) > perhaps (instead of stealing your time), but if my problem is caused by > some kind of bug or perhaps by an undetected feature in either of these > software, this list is, by far, my best hope to find other victims ;;-)) > > Any Ideas? Hi Christoph with a difficult-to-reproduce bug like this, you have to narrow down the problem area more. Basically you are saying the problem is with reading these characters from files and databases. Is that correct? When the problem occurs, does tomcat carry on handling incoming characters correctly? e.g. saving them to the DB or file correctly? I mean, how does the problem manifest itself? In the browser, or the database or the logs or other files? Which character set are you using? iso-8859-1 or iso-8859-15? You say you have your OS, your Java, appserver and database all set to use one of these, if I understand correctly. Presumably consistently the one or the other and not a mix? Is this app in production? What sort of load is it handling? I ask to see what the feasibility of changing the appserver is - try the 90 day trial of weblogic for instance. Does that suffer the problem too? What about changing to IBM Java? Or even from Linux to Windows? Adam -- struts 1.1 + tomcat 5.0.12 + java 1.4.2 Linux 2.4.20 RH9 --------------------------------------------------------------------- To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: tomcat-user-help@jakarta.apache.org