Mailing-List: contact tomcat-dev-help@jakarta.apache.org; run by ezmlm
Message-ID: <381F352A.3BCE745D@Eng.Sun.COM>
Date: Tue, 02 Nov 1999 11:02:02 -0800
From: "Anil K. Vijendran" <Anil.Vijendran@eng.sun.com>
MIME-Version: 1.0
To: tomcat-dev@jakarta.apache.org
Subject: Re: JASPER: page charset handling broken
References: <381EEEC6.5FEB4E83@interdata.com.pl>
 <381EFAFD.FD4020DF@Eng.Sun.COM> <381F3458.C83AD329@interdata.com.pl>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit


Michal Mosiewicz wrote:

> "Anil K. Vijendran" wrote:
> >
> > [Moving the discussion to tomcat-dev]
> >
> > I recently heard about this, myself from one of the users of this JSP
> > engine. I believe the way it is supposed to work is that you read until you
> > encounter contentType and then re-read the file using the encoding you saw
> > in contentType. Right now, the JSP engine always uses the encoding obtained
> > using System.getProperty("file.encoding", "8859_1").
>
> It seems that there are more than one bug...

Quite possible :-)

> I have done exactly what you're talking about. I.e. I changed
> createJspReader to pass additional encoding parameter, and changed
> Compiler to check files twice if it appears that the file was read using
> a different encoding.

Let's investigate this a bit more and then I can commit your patch. I'm hoping to
hear from folks that implement XML parsers :-) since they have to deal with
similar issues.

> The result is somehow strange... If I set 'charset=iso-8859-1', I can
> see that the content of resulting page matches what I typed. However, if
> I try using iso-8859-2, I can see in the source of page, that it looks
> like it was interpreted as unicode string...
>
> For example, by using (excuse me this 8859-2 chars) the following
> characters: "��󳱿�������", I get them exactly the same in resulting
> page if I set charset=iso-8859-1. Of course it is improperly interpreted
> by the browser, becouse charset is obviously wrong, but the codes are
> matched. However, if I set iso-8859-2, I get something like:
> '|z�B|z|zzz�B' as result, and
> "...|z\u00f3B\u0005|z\u0005|zzz\u00f3B\u0005..." in the page source.
>
> It seems like setting iso-8859-2 makes my JVM to interpret the stream as
> unicode???
>
> -- Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org

--
Peace, Anil +<:-)