Return-Path: Delivered-To: apmail-tomcat-users-archive@www.apache.org Received: (qmail 39017 invoked from network); 24 Feb 2010 11:36:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Feb 2010 11:36:28 -0000 Received: (qmail 50285 invoked by uid 500); 24 Feb 2010 11:36:24 -0000 Delivered-To: apmail-tomcat-users-archive@tomcat.apache.org Received: (qmail 50228 invoked by uid 500); 24 Feb 2010 11:36:24 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 50217 invoked by uid 99); 24 Feb 2010 11:36:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Feb 2010 11:36:24 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of aw@ice-sa.com designates 212.85.38.228 as permitted sender) Received: from [212.85.38.228] (HELO tor.combios.es) (212.85.38.228) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Feb 2010 11:36:16 +0000 Received: from localhost (localhost [127.0.0.1]) by tor.combios.es (Postfix) with ESMTP id 2BF1D2260ED for ; Wed, 24 Feb 2010 12:35:54 +0100 (CET) Received: from tor.combios.es ([127.0.0.1]) by localhost (tor.combios.es [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fx-o5eMxbpIV for ; Wed, 24 Feb 2010 12:35:54 +0100 (CET) Received: from [192.168.245.129] (p549EA950.dip0.t-ipconnect.de [84.158.169.80]) by tor.combios.es (Postfix) with ESMTPA id B47D42260E4 for ; Wed, 24 Feb 2010 12:35:53 +0100 (CET) Message-ID: <4B850EF3.8070406@ice-sa.com> Date: Wed, 24 Feb 2010 12:35:15 +0100 From: =?UTF-8?B?QW5kcsOpIFdhcm5pZXI=?= Reply-To: Tomcat Users List User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: Tomcat Users List Subject: Re: Encoding problem with Tomcat (hibernate) + Postgres References: <27714136.post@talk.nabble.com> In-Reply-To: <27714136.post@talk.nabble.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit davefu wrote: > Hi, this is my setup: > > - Debian Lenny > - Tomcat 5.5 > - Postgres 8.3 > > I'm running an app which is failing everytime it tries to get some data from > the DB with characters like [ÁÉÍÓÚáéíóú]. By "failing" I mean the > application isn't showing the data it should when Tomcat throws querys to > Postgres. > Hola. There is no problem with your English, you are doing fine. Where is it showing this (wrong) data ? Do you mean in the result page which you see in the browser ? Have you saved this page to disk, and examined it with an editor, to see what is really contained in that html file ? I don't know what the exact problem might be, but let me give you (1) my sympathy, because these problems are usually horrible, and (2) a recommendation first of all, before even starting to decipher the problem : You have to question *everything* you see, and not take anything for granted. For example, when you edit a logfile with an editor, you have to question whether what you see on the screen through this editor, is really what is in the logfile, byte-by-byte. If possible, use a non-UTF8 locale, and an editor which is /not/ UTF-8 aware, and really shows you the /bytes/ in the logfile, rather than the /characters/. It may be less readable, but at least you will be sure that you really see what is in the logfile. Then you have to ask yourself the question : does the program which writes the logfile, write it "as characters" using a UTF-8 encoding, or not ? (it is not as evident as one may think at first) This may all sound silly, but I can guarantee that if you do not ask yourself these questions first, and at every step, you may be going down the wrong track when trying to understand what is going on. For example, when you say that you see this in the logfile : "...(TRANSLATE(upper(cxfaexpedi0_.nombre_madre), 'ÁÉÍÓÚáéíóúñÑ', 'AEIOUAEIOUNN')like TRANSLATE(upper('%MARÍA%')..." is that first "capital A tilde" really one byte /in the logfile/, or is it itself already the 2-byte UTF-8 encoding of the "capital A tilde" character, which you see on your screen as a single A tilde character, because your editor and your locale have conspired to translate it visually ? (and, when you post it here, has it been re-encoded one more time by the email program ? ;-) ) (Note that it also looks like, in the snippet above, you have more than just UTF-8 encoding going on; there are also "entities" such as "‰". Where do these come from ?) Next, when you change a setting which has an impact on the encoding, do it one change at a time, and double-check the result along the lines above. Some of the things which may have an impact are : - the default system locale - the "locale" of the process which is running the database - the encoding settings of the database itself - the "locale" of the process under which Tomcat is running - whether or not the application "streams" which are used to communicate with the database use the "default platform encoding", or have a specific encoding specified when opening the stream - the locale of the process you are using to run the editor which you use to look at the logfiles - the settings of that editor - and probably quite a few others which I forget Each one of these may cause some intermediate translation which is not evident at first. Another example : XML files have, by definition, a "default encoding" which is UTF-8, or else the encoding is specified in the leading XML declaration. XML parsers know this, and will read the file in the appropriate encoding. The same is probably true for HTML parsers (although in that case the default should be ISO-8859-1). So for example a JSP page that Tomcat uses, will always be read correctly. But the same is not true for sockets that Tomcat may open to talk to some external software. If such a socket is opened without specifying an encoding, then it will default to the "default platform encoding", which in the case of Tomcat is the encoding of the process running the JVM which runs Tomcat. And the same is also not true for anything that goes over the HTTP protocol. There the default is ISO-8859-1, unless explicitly specified otherwise. So yes, it is a mess, be prepared. But it is also interesting. --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org For additional commands, e-mail: users-help@tomcat.apache.org