Return-Path: Delivered-To: apmail-tomcat-users-archive@www.apache.org Received: (qmail 91744 invoked from network); 12 Sep 2008 15:00:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 Sep 2008 15:00:14 -0000 Received: (qmail 66140 invoked by uid 500); 12 Sep 2008 14:59:59 -0000 Delivered-To: apmail-tomcat-users-archive@tomcat.apache.org Received: (qmail 66113 invoked by uid 500); 12 Sep 2008 14:59:59 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 66102 invoked by uid 99); 12 Sep 2008 14:59:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Sep 2008 07:59:59 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [212.85.38.174] (HELO popeye.combios.es) (212.85.38.174) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Sep 2008 14:58:59 +0000 Received: from [192.168.250.50] (p549EA41C.dip0.t-ipconnect.de [84.158.164.28]) (authenticated bits=0) by popeye.combios.es (8.13.8/8.13.8/Debian-3) with ESMTP id m8CExK0L025614 for ; Fri, 12 Sep 2008 16:59:21 +0200 Message-ID: <48CA836E.6000809@ice-sa.com> Date: Fri, 12 Sep 2008 16:57:50 +0200 From: =?ISO-8859-1?Q?Andr=E9_Warnier?= User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) MIME-Version: 1.0 To: Tomcat Users List Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem References: <8c8c29080809100827g7eb749cfw8bce203dc0e13001@mail.gmail.com> <8c8c29080809110617t768b6f8j6fa71d9ff0cd3c53@mail.gmail.com> <00aa01c9141a$9f375a80$0300000a@animal> <48C94A6C.4030501@christopherschultz.net> <012b01c9143a$7be84c90$0300000a@animal> <48C96AEC.4010205@cornell.edu> <0AAE5AB84B013E45A7B61CB66943C17214303910F4@USEA-EXCH7.na.uis.unisys.com> <48CA2390.30208@ice-sa.com> <427155180809120156h57db3c8qa523b4f00f18ffc5@mail.gmail.com> <48CA572E.8040605@ice-sa.com> <427155180809120505k56fe505k3dd5ba61debfda7a@mail.gmail.com> In-Reply-To: <427155180809120505k56fe505k3dd5ba61debfda7a@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on popeye.combios.es X-Virus-Scanned: ClamAV 0.92.1/8227/Fri Sep 12 13:48:22 2008 on popeye.combios.es X-Virus-Status: Clean X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-97.9 required=2.5 tests=RCVD_IN_PBL, RCVD_IN_SORBS_DUL,USER_IN_WHITELIST autolearn=no version=3.2.3 Konstantin Kolinko wrote: > 2008/9/12 Andr� Warnier > >> Konstantin Kolinko wrote: >> >>> 2008/9/12 Andr� Warnier : >>> >>>> Caldarale, Charles R wrote: >>>> >>>>> I'm not sure these days what the "normal web character set" really is. >>>>> If >>>>> you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling >>>>> symbol is not present. However, for any of the ISO-8859-x variants, it >>>>> is >>>>> present, using the 163 (0xA3) value you noted (same as the Unicode code >>>>> point). It's also in UTF-8 of course, but requires two bytes (0xC2 >>>>> 0xA3) to >>>>> represent the code point. >>>>> >>>>> I love these discussions about character sets. They seem to confuse so >>>> many >>>> people; even I, who have been involved in them for 30 years... >>>> >>>> Anyway, I have a related question, which I don't think constitutes a >>>> hijack >>>> of this thread, because the underlying cause is probably similar. >>>> Here it goes : >>>> >>>> Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x) >>>> The above Tomcat's running under the same Linux or Solaris, essentially >>>> set >>>> up the same way. The JVM may vary, but I don't think that is the problem, >>>> because of the consistency of the problem as explained below. >>>> I am running a webapp from an external supplier, always the same binary >>>> version. I don't have the code, can't see what's in it. >>>> The pages served by that webapp are the same html pages, all of them >>>> having >>>> a declaration . >>>> The pages also *are* properly encoded as iso-8859-1 (100% positive, I >>>> know >>>> the difference). >>>> The browser receiving the pages is always the same one, same settings. >>>> >>>> Now, >>>> >>>> case a) >>>> in the Tomcat startup files, I do nothing, meaning I just take Tomcat >>>> out-of-the-box and run the webapp. >>>> Result : in any such html page that contains characters with an ISO-8859 >>>> codepoint above \xA0 (meaning the displayable characters of the "high" >>>> part >>>> of the table, where one finds things like "uppercase A with umlaut"), >>>> these >>>> characters >>>> - appear in the browser display as "?" (minus the quotes) >>>> - also if I save the page from the browser to disk, and look at them >>>> with >>>> an iso-8859-1 capable editor, they are effectively "?". >>>> (So it's not the browser misunderstanding them, it is Tomcat sending them >>>> that way). >>>> >>>> case b) >>>> In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or >>>> even >>>> in /etc/init.d/tomcat5.5), I add the following line >>>> LC_CTYPE="en_us.iso88591" >>>> (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE) >>>> (before the actual start of Tomcat) >>>> and restart Tomcat >>>> then the same page displays properly in the browser, and also is correct >>>> iso-8859-1 when saved to disk and examined with the editor. >>>> (In other words, what previously were "?" characters, are now the correct >>>> iso-8859-1 character bytes). >>>> >>>> Now my question is : >>>> How can it matter which LC_CTYPE Tomcat is started under, that would have >>>> the result above ? >>>> The behaviour above is consistent across different hosts, across the same >>>> or >>>> different Tomcat versions, it is always the same webapp, always the same >>>> html pages, always the same browser, etc. Only that LC_CTYPE line >>>> changes >>>> the behaviour. >>>> On the face of it, the only thing I can think of that would explain this, >>>> is >>>> that the webapp in question does something wrong, but what exactly could >>>> it >>>> be doing ? >>>> Any ideas ? >>>> >>>> >>> It is <%@page pageEncoding="..." %> that is missing from those pages. >>> Thus JSP compiler does not know what encoding they are using for their >>> source and messes them at compilation time. >>> >> [...] >> >> But these pages, as far as Tomcat and the webapp are concerned, are not >> dynamic >> > in any way. They are straight static html pages. >> So is the JSP stuff relevant ? >> (I'm genuinely asking, since I know nothing about JSP pages) >> >> > The static HTML pages, as well as all the other static files, are served by > the > DefaultServlet. You should dig there. I think that fileEncoding > initialization parameter > of the servlet, as well as settings in web.xml come into > play. > > JSP settings are irrelevant for them, of course. > Hi. Thanks for the intent and answer above. But I insist : these html pages are served by that webapp of which I am talking, not by the DefaultServlet. Those pages are being accessed via URLs like http://myhost.mycompany.com/myservlet?..(additional parameters indicating which static file to serve).. It is on the way through that servlet that they get "corrupted", unless I start Tomcat with LC_CTYPE="iso-8859-1". That servlet, in its own web.xml config file in tomcat_dir/webapps/myservlet/WEB-INF/web.xml, has no fileEncoding nor mime-mapping section nor parameter. So my question remains, I think : what could be going on in that servlet so that : - if LC_CTYPE is not set in the environment *of Tomcat* when it starts, the upper iso-8859-1 characters in the pages are replaced by "?" - if LC_CTYPE is set to "iso-8859-1" in the Tomcat environment when it starts, then the pages delivered by the servlet are correct ? I am not very qualified in Java, but could it be something like : - the servlet reads those documents with some InputStream, without specifying a character set or encoding, and by default that means to use Tomcat's idea of its default LC_CTYPE for those InputStreams ? - or the servlet outputs the document via an OutputStream without specifying an encoding etc.. ? Andr� --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org For additional commands, e-mail: users-help@tomcat.apache.org