Return-Path: Mailing-List: contact tomcat-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list tomcat-dev@jakarta.apache.org Received: (qmail 44460 invoked from network); 21 Dec 2000 23:40:21 -0000 Received: from patan.sun.com (192.18.98.43) by locus.apache.org with SMTP; 21 Dec 2000 23:40:21 -0000 Received: from shorter.eng.sun.com ([129.144.251.35]) by patan.sun.com (8.9.3+Sun/8.9.3) with ESMTP id PAA05218; Thu, 21 Dec 2000 15:40:18 -0800 (PST) Received: from sun.com (d-ucup02-251-98 [129.144.251.98]) by shorter.eng.sun.com (8.9.3+Sun/8.9.3/ENSMAIL,v1.7) with ESMTP id PAA26621; Thu, 21 Dec 2000 15:40:17 -0800 (PST) Message-ID: <3A42956B.C97C4A96@sun.com> Date: Thu, 21 Dec 2000 15:42:35 -0800 From: Pierre Delisle X-Mailer: Mozilla 4.76 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: tomcat-dev@jakarta.apache.org, nathan.dunn@tenzing.com Subject: Bug #55: Default for included files is 8859_1, with no option to set otherwise Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Rating: locus.apache.org 1.6.2 0/1000/N Trying to close a few Jasper bugs before the holiday break. I'd appreciate at least another pair of eyes to review what I believe should be done on that one... -- Pierre ----- Bug #55 ----- Synopsis: Default for included files is 8859_1, with no option to set otherwise. Report Description: The default for reading an included file is ISO_8859_1. We can, of course, set pageConent to read UTF-8 (which is what we need it to be to support international code). Unfortunately, when there are two or more levels of encoding (or the pageContent type ins't set), the encoding that the JspReader gets set to a hard-coded "ISO_8859_1", and doesn't allow this to be set to anything else via the runtime system properties. In: org.apache.jasper.compiler.JspReader JspReader.java line 158, encoding ALWAYS defaults to 8859_1, and the file.encoding, when set from the System properties. This is an easy fix, to set encoding to: encoding = System.getPropert("file.encoding","8859_1") ; The result, typically, is that the file will flake out and convert all of the non-UTF-8 characters to US-ASCII, @%, etc. ----- I'm not sure I fully understand what's described there, so here is what I believe should be done. The "encoding" for a JSP file is currently handled as follows: 1. In Compiler.java, we create a JspReader for the top-level ("including") jsp file using the 8859_1 encoding. 2. Using that JspReader, we check if there is a page directive with 'contentType' specified. If there is, then a new JspReader for the page is created with the encoding set to the "charset" specified in the contentType value of the page directive; otherwise we stick with the default 8859_1 encoding. 3. When a page is included, JspReader.pushFile() is called, and the encoding passed as argument appears to always be null (since no encoding attribute can be specified in the "include" directive, reading 'encoding' off of the attributes appears to be a bug in JspParseEventListener). Because it is null, it always defaults to 8859_1. If I understand well the intent of the bug report, we'd need the following modifications: - In step 2, if contentType is not specified in the "including" page, set the encoding to be: encoding = System.getProperty("file.encoding", "8859_1"); This means that the default encoding of all JSP files at a site could be defined globally using system property "file.encoding". I don't think this is spec-compliant, and would be reluctant to make that change. - In step 3, use the encoding of the "including" page. This would fix what I believe is a bug in the current implementation. Comments? -- Pierre