tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Mosiewicz <m...@interdata.com.pl>
Subject Re: JASPER: page charset handling broken
Date Tue, 02 Nov 1999 18:58:32 GMT
"Anil K. Vijendran" wrote:
> 
> [Moving the discussion to tomcat-dev]
> 
> I recently heard about this, myself from one of the users of this JSP
> engine. I believe the way it is supposed to work is that you read until you
> encounter contentType and then re-read the file using the encoding you saw
> in contentType. Right now, the JSP engine always uses the encoding obtained
> using System.getProperty("file.encoding", "8859_1").

It seems that there are more than one bug...

I have done exactly what you're talking about. I.e. I changed
createJspReader to pass additional encoding parameter, and changed
Compiler to check files twice if it appears that the file was read using
a different encoding. 

The result is somehow strange... If I set 'charset=iso-8859-1', I can
see that the content of resulting page matches what I typed. However, if
I try using iso-8859-2, I can see in the source of page, that it looks
like it was interpreted as unicode string...

For example, by using (excuse me this 8859-2 chars) the following
characters: "żźółążźążźźźółą", I get them exactly the same in resulting
page if I set charset=iso-8859-1. Of course it is improperly interpreted
by the browser, becouse charset is obviously wrong, but the codes are
matched. However, if I set iso-8859-2, I get something like:
'|zóB|z|zzzóB' as result, and
"...|z\u00f3B\u0005|z\u0005|zzz\u00f3B\u0005..." in the page source.

It seems like setting iso-8859-2 makes my JVM to interpret the stream as
unicode??? 

-- Mike

Mime
View raw message