tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kaz...@ingrid.org (Kazuhiro Kazama)
Subject Re: charset used for parameters decoding on HTTP request Tomcat3.x,4
Date Thu, 15 Feb 2001 10:28:14 GMT
From: Hans Bergsten <hans@gefionsoftware.com>
Subject: Re: charset used for parameters decoding on HTTP request Tomcat3.x,4
Date: Wed, 14 Feb 2001 11:47:17 -0800
Message-ID: <3A8AE0C5.880B66B4@gefionsoftware.com>
> I'm afraid I have to -1 this proposal. Sure, it may be a nice feature but it's
> not defined by Servlet 2.2. And, for better or for worse, TC 3.x is the
> Reference 
> Implementation for Servlet 2.2. If we add this behavior to TC 3.x, a servlet
> that takes advantage of it will not be portable to other spec compliant 2.2
> containers.

Agreed.

Some vendor surly has already introduced their own encoding detection
methods which Costin mentioned. But the detail of detection method
isn't opened and it caused breakage under a complicated environment.

Servlet 2.3 will introduce setCharacterEncoding() method. This is a
simple, but I think this is a good solution.

Although some i18n problems are solved in Servlet 2.3 and JSP 1.2, it
is inappropriate to introduce a new spec. I (and perhaps all japanese)
hope to transition to Servlet 2.3 and JSP 1.2. It is better to use
Servlet 2.3 spec in Tomcat 3.3 ... Is it exceed the limit of Tomcat
3.3?

From: Adalbert Wysocki <waldi@imediation.com>
Subject: RE: charset used for parameters decoding on HTTP request Tomcat3.	x,4
Date: Wed, 14 Feb 2001 14:26:19 -0000
Message-ID: <9B3E950CB293D411ADF4009027B0A4D2975A3A@PARSV011>
>  * we suppose that the request's parameters encoding is the one used for the
> response to this request content encoding. If the servlet processing
> generates a result page encoded with Shift_JIS charset, it is reasonnable to
> suppose that the incoming form data used for the page generation is encoded
> with the Shift_JIS charset.

There is a exception. In Japan, some systems sometime accept another
charset because JIS character set can be encoded in ISO-2022-JP,
EUC-JP and Shift_JIS, and user-defined HTML forms may be encoded in
another charset. In this case, they uses a "JISAutoDetect" converter
that has auto recognition facility for JIS variant character
encodings.

From: Adalbert Wysocki <waldi@imediation.com>
Subject: charset used for parameters decoding on HTTP request Tomcat3.x,4
Date: Mon, 12 Feb 2001 18:00:14 -0000
Message-ID: <9B3E950CB293D411ADF4009027B0A4D291FD81@PARSV011>
> NB: A solution would be to overwrite the system property "file.encoding" on
> the command line. But on exotic platforms (such as Korean), overwriting the

In Japan, another solution is used:

    s = new String(s.getBytes("iso-8859-1"), "Shift_JIS");

This method is dirty. But it don't change a Java default character
encoding. And it can work on Servlet 2.3 based container because
Servlet 2.3 defines the default value is "iso-8859-1".

Kazuhiro Kazama (kazama@ingrid.org)		NTT Network Innovation Laboratories

Mime
View raw message