Return-Path: Mailing-List: contact tomcat-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list tomcat-dev@jakarta.apache.org Received: (qmail 38729 invoked from network); 7 Dec 2000 02:32:46 -0000 Received: from ns.ssu.ac.kr (HELO saint.ssu.ac.kr) (203.253.31.1) by locus.apache.org with SMTP; 7 Dec 2000 02:32:46 -0000 Received: from math.soongsil.ac.kr ([203.253.9.16]) by saint.ssu.ac.kr (8.9.3/8.9.3) with ESMTP id LAA05058 for ; Thu, 7 Dec 2000 11:25:53 +0900 (KST) Received: from localhost (phkim@localhost) by math.soongsil.ac.kr (8.9.1b+Sun/8.9.1) with ESMTP id LAA14465 for ; Thu, 7 Dec 2000 11:30:42 +0900 (KST) Date: Thu, 7 Dec 2000 11:30:42 +0900 (KST) From: Pilho Kim To: tomcat-dev@jakarta.apache.org Subject: Re: My patches for Tomcat 3.2 wrt mutlibyte characters In-Reply-To: <20001205182043F.kazama@ingrid.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Rating: locus.apache.org 1.6.2 0/1000/N On Tue, 5 Dec 2000, Kazuhiro Kazama wrote: > From: Pilho Kim > Subject: My patches for Tomcat 3.2 wrt mutlibyte characters > Date: Tue, 5 Dec 2000 09:47:38 +0900 (KST) > Message-ID: > > Try to visit > > > > http://www.javaclue.org/tomcat/patch32/dopatch.html > > > > I hope that those would be adopted in TC 3.2.1. > > We are developing similar patches for japanese users. Your patches > have a few problems: > > 1, Don't change a DEFAULT_CHAR_ENCODING constant in > src/share/org/apache/tomcat/core/Constants.java > > Web i18n basics is to specify "right" charset. If you specify charset > explicitly in tomcat 3.2, your Web applications work well except for a > few well-known problems (JSP's include charset & getParameter() etc. - > they are resolved in Servlet API 2.3 & JSP 1.2). > > And "iso-8859-1" is defined as default charset in Servlet API 2.3 > final draft (See 4.9, 5.4, javax.servlet.ServletResponse). It isn't > desirable to introduce another i18n concept to Servlet API 2.2. > > Of cource, some internal modules such as DefaultCMSetter must send a > reply in platform native charset because of native library's localized > messages etc. In this case, it is better to specify charset > internally. > This is your big fault. You should know the real meaning of default in API. What is the default encoding in JVM? > 2, Don't change Jasper's default encoding to a platform native > character encoding. > > This spoils platform independency of JSP files. For example, it is > very popular to serve JSP files encoded in Shift_JIS (this is used on > Windows-PC) on an unix-variant server which default charset is EUC-JP. > > And JSP files specified charset work well except for "include" > directive. > > JSP 1.2 provide "pageEncoding" attribute for this problem. But there > are no walkaround in JSP 1.1. This is a serious problem. I think > charset-inheritance mechanism is better if possible. > > We have a plan to provide pageEncoding patch (JSP 1.2 feature > implementation to JSP 1.1)for this problem and this patch will be used > in user's own risk. But it isn't good to provide it officially. > Do you know which encoding in class file is used ? It is not iso-8859-1, but utf8. You should read my patch for ClassName > 3, Don't use non-IANA charset. > > Java's default encoding name is almost converter's name but isn't > included in IANA registry. > > But there is no best way to convert Java's encoding to IANA charset in > current JDK & Tomcat. > > I think that it is reasonable to use > org.apache.tomcat.util.LocaleToCharsetMap (see ResponseImpl.java). > > I send new patch of org.apache.tomcat.context.DefaultCMSetter. This > patch sets charset to iso-8859-1 in the case that LocaleToCharsetMap > returns null (specified locale isn't registered), but there may be a > better code. > My main patch is to solve the mixed URL-encoded strings. You have never mentioned about that. Read my patches more carefully. Kim