tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Craig R. McClanahan" <>
Subject Re: Multibytes
Date Sat, 28 Jul 2001 01:07:36 GMT

On Fri, 27 Jul 2001, Sankaranarayanan Ganapathy wrote:

> I am sending  shift_jis multibyte characters as part of http get request to
> a servlet. But from the headers I get I am not able to say what the encoding
> is. I cannot always assume that the charset  is shift_jis because it could
> be big5 or something else in other cases. Doesnt the http request protocol
> have a header saying what the encoding is?

Yes it does.  Clients are supposed to include the character set they
submit with on the Content-Type header, like this:

  Content-Type:  application/x-www-form-urlencoded; charset=big5

However, there are some complications in "real life":

* Most browsers don't actually do this.

* Even if they did, it would only work on POST requests.  There is
  nothing in the HTTP/1.1 spec that indicates how the server is supposed
  to decode query string parameters on a GET.

> Does anybody know?
> Any help will be greatly appreciated.

One of the API features added in Servlet 2.3 (and hence Tomcat 4.0) is the
ability to call request.setCharacterEncoding() if your application can
determine on its own what the character encoding should be.  As long as
you call this before calling any of the request.getParameter() family of
methods, this will take effect (for both GET and POST parameters).

Additionally, but again only under Servlet 2.3, you might choose to
implement a Filter that does this for you, so that you don't have to
modify every single servlet and JSP page.  An example of such a filter is
included in the "examples" web app with Tomcat 4.0.

For a servlet 2.2 environment, about all you can do is call
request.getQueryString() -- which should be undecoded -- and decode it

> Thanx
> Ganesh

Craig McClanahan

> I have enclosed the http request that my browser sent :
> /MessageServlet?message=%8E%A9%93%AE%92%85%90M%89w%88%F5%91S%91R%8A%88%90%AB
> +auto+attendant+not+active HTTP/1.1
> Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
> application/, application/, application/msword,
> */*
> Accept-Language: en-us
> Accept-Encoding: gzip, deflate
> User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
> Host: localhost:8080
> Connection: Keep-Alive

View raw message