Return-Path: Mailing-List: contact tomcat-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list tomcat-dev@jakarta.apache.org Received: (qmail 26832 invoked from network); 2 Nov 1999 19:01:14 -0000 Received: from mercury.sun.com (192.9.25.1) by apache.org with SMTP; 2 Nov 1999 19:01:14 -0000 Received: from shorter.eng.sun.com ([129.144.252.35]) by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id LAA21726 for ; Tue, 2 Nov 1999 11:00:51 -0800 (PST) Received: from Eng.Sun.COM (hobo125.EBay.Sun.COM [129.150.99.125]) by shorter.eng.sun.com (8.9.3+Sun/8.9.3/ENSMAIL,v1.7) with ESMTP id LAA04907 for ; Tue, 2 Nov 1999 11:00:39 -0800 (PST) Message-ID: <381F352A.3BCE745D@Eng.Sun.COM> Date: Tue, 02 Nov 1999 11:02:02 -0800 From: "Anil K. Vijendran" X-Mailer: Mozilla 4.61 [en] (WinNT; I) X-Accept-Language: en MIME-Version: 1.0 To: tomcat-dev@jakarta.apache.org Subject: Re: JASPER: page charset handling broken References: <381EEEC6.5FEB4E83@interdata.com.pl> <381EFAFD.FD4020DF@Eng.Sun.COM> <381F3458.C83AD329@interdata.com.pl> Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Michal Mosiewicz wrote: > "Anil K. Vijendran" wrote: > > > > [Moving the discussion to tomcat-dev] > > > > I recently heard about this, myself from one of the users of this JSP > > engine. I believe the way it is supposed to work is that you read until you > > encounter contentType and then re-read the file using the encoding you saw > > in contentType. Right now, the JSP engine always uses the encoding obtained > > using System.getProperty("file.encoding", "8859_1"). > > It seems that there are more than one bug... Quite possible :-) > I have done exactly what you're talking about. I.e. I changed > createJspReader to pass additional encoding parameter, and changed > Compiler to check files twice if it appears that the file was read using > a different encoding. Let's investigate this a bit more and then I can commit your patch. I'm hoping to hear from folks that implement XML parsers :-) since they have to deal with similar issues. > The result is somehow strange... If I set 'charset=iso-8859-1', I can > see that the content of resulting page matches what I typed. However, if > I try using iso-8859-2, I can see in the source of page, that it looks > like it was interpreted as unicode string... > > For example, by using (excuse me this 8859-2 chars) the following > characters: "��󳱿�������", I get them exactly the same in resulting > page if I set charset=iso-8859-1. Of course it is improperly interpreted > by the browser, becouse charset is obviously wrong, but the codes are > matched. However, if I set iso-8859-2, I get something like: > '|z�B|z|zzz�B' as result, and > "...|z\u00f3B\u0005|z\u0005|zzz\u00f3B\u0005..." in the page source. > > It seems like setting iso-8859-2 makes my JVM to interpret the stream as > unicode??? > > -- Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org > For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org -- Peace, Anil +<:-)