tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Tomcat Wiki] Update of "FAQ/CharacterEncoding" by KonstantinKolinko
Date Sun, 28 Mar 2010 13:20:22 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tomcat Wiki" for change notification.

The "FAQ/CharacterEncoding" page has been changed by KonstantinKolinko.
The comment on this change is: Rearranged.
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding?action=diff&rev1=13&rev2=14

--------------------------------------------------

  = Character Encoding Issues =
  
  == Questions ==
+ 
+  1. '''Why'''
-  1. [[#Q1|What is the default character encoding of the request or response body?]]
+   1. [[#Q1|What is the default character encoding of the request or response body?]]
+   1. [[#Q9|Why does everything have to be this way?]]
+  1. '''How'''
-  1. [[#Q2|How do I change how GET parameters are interpreted?]]
+   1. [[#Q2|How do I change how GET parameters are interpreted?]]
-  1. [[#Q3|How do I change how POST parameters are interpreted?]]
+   1. [[#Q3|How do I change how POST parameters are interpreted?]]
+   1. [[#Q8|What can you recommend to just make everything work? (How to use UTF-8 everywhere).]]
-  1. [[#Q4|How can I test if my configuration will work correctly?]]
+   1. [[#Q4|How can I test if my configuration will work correctly?]]
-  1. [[#Q6|How can I send higher characters in HTTP headers?]]
+   1. [[#Q6|How can I send higher characters in HTTP headers?]]
+  1. '''Troubleshooting'''
-  1. [[#Q8|What can you recommend to just make everything work? -- How to use UTF-8 everywhere.]]
-  1. [[#Q9|Why does everything have to be this way?]]
-  1. [[#Q5|I'm having a problem with character encoding in Tomcat 5]]
+   1. [[#Q5|I'm having a problem with character encoding in Tomcat 5]]
  
  == Answers ==
+ 
+ === Why ===
+ 
  <<Anchor(Q1)>>'''What is the default character encoding of the request or response
body?'''
  
  If a character encoding is not specified, the Servlet specification requires that an encoding
of ISO-8859-1 is used. The character encoding for the body of an HTTP message (request ''or''
response) is specified in the `Content-Type` header field. An example of such a header is
`Content-Type: text/html; charset=ISO-8859-1` which explicitly states that the default (ISO-8859-1)
is being used.
  
  References: [[http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1|HTTP 1.1 Specification,
Section 3.7.1]]
  
- <<Anchor(Q2)>>'''How do I change how GET parameters are interpreted?'''
  
+ ----
- Tomcat will use ISO-8859-1 as the default character encoding of the entire URL, including
the query string ("GET parameters").
- 
- There are two ways to specify how GET parameters are interpreted:
- 
-  1. Set the `URIEncoding` attribute on the <Connector> element in server.xml to something
specific (e.g. `URIEncoding="UTF-8"`).
-  1. Set the `useBodyEncodingForURI` attribute on the <Connector> element in server.xml
to `true`. This will cause the Connector to use the request body's encoding for GET parameters.
- 
- References: [[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|Tomcat 6 HTTP Connector]],
[[http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html|Tomcat 6 AJP Connector]]
- 
- <<Anchor(Q3)>>'''How do I change how POST parameters are interpreted?'''
- 
- POST requests should specify the encoding of the parameters and values they send. Since
many clients fail to set an explicit encoding, the default is used (ISO-8859-1). In many cases
this is not the preferred interpretation so one can employ a javax.servlet.Filter to set request
encodings. Writing such a filter is trivial. Furthermore Tomcat already comes with such an
example filter. Please take a look at:
-  5.x::
- {{{
- webapps/servlets-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
- webapps/jsp-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
- }}}
-  6.x::
- {{{
- webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
- }}}
- 
- <<Anchor(Q4)>>'''How can I test if my configuration will work correctly?'''
- 
- The following sample JSP should work on a clean Tomcat install for any input. If you set
the URIEncoding="UTF-8" on the connector, it will also work with method="GET".
- {{{
- <%@ page contentType="text/html; charset=UTF-8" %>
- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
- <html>
-    <head>
-      <title>Character encoding test page</title>
-    </head>
-    <body>
-      <p>Data posted to this form was:
-      <%
-        request.setCharacterEncoding("UTF-8");
-        out.print(request.getParameter("mydata"));
-      %>
- 
-      </p>
-      <form method="POST" action="index.jsp">
-        <input type="text" name="mydata">
-        <input type="submit" value="Submit" />
-        <input type="reset" value="Reset" />
-      </form>
-    </body>
- </html>
- }}}
- 
- <<Anchor(Q8)>>'''How can I send higher characters in my HTTP headers?'''
- 
- You have to encode them in some way before you insert them into a header. Using url-encoding
(`%` + high byte number + low byte number) would be a good idea.
- 
- <<Anchor(Q8)>>'''What can you recommend to just make everything work? -- How
to use UTF-8 everywhere.'''
- 
- Using `UTF-8` as your character encoding for everything is a safe bet. This should work
for pretty much every situation.
- 
- In order to completely switch to using UTF-8, you need to make the following changes:
- 
-  1. Set {{{URIEncoding="UTF-8"}}} on your <Connector> in `server.xml`. References:
[[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|HTTP Connector]], [[http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html|AJP
Connector]].
-  1. Use a [[#Q3|character encoding filter]] with the default encoding set to UTF-8
-  1. Change all your JSPs to include charset name in their contentType. For example, use
{{{<%@page contentType="text/html; charset=UTF-8" %>}}} for the usual JSP pages and
{{{<jsp:directive.page contentType="text/html; charset=UTF-8" />}}} for the pages in
XML syntax (aka JSP Documents).
-  1. Change all your servlets to set the content type for responses and to include charset
name in the content type to be UTF-8. Use {{{response.setContentType("text/html; charset=UTF-8")}}}
or {{{response.setCharacterEncoding("UTF-8")}}}.
-  1. Change any content-generation libraries you use (Velocity, Freemarker, etc.) to use
UTF-8 and to specify UTF-8 in the content type of the responses that they generate.
-  1. Disable any valves or filters that may read request parameters before your character
encoding filter or jsp page has a chance to set the encoding to UTF-8.  For more information
see http://www.mail-archive.com/users@tomcat.apache.org/msg21117.html.
  
  <<Anchor(Q9)>>'''Why does everything have to be this way?'''
  
@@ -124, +66 @@

  
  Section 3.1 of the ARPA Internet Text Messages spec states that headers are always in US-ASCII
encoding. Anything outside of that needs to be encoded. See the section above regarding query
strings in URIs.
  
+ 
+ ----
+ 
+ === How ===
+ 
+ <<Anchor(Q2)>>'''How do I change how GET parameters are interpreted?'''
+ 
+ Tomcat will use ISO-8859-1 as the default character encoding of the entire URL, including
the query string ("GET parameters").
+ 
+ There are two ways to specify how GET parameters are interpreted:
+ 
+  1. Set the `URIEncoding` attribute on the <Connector> element in server.xml to something
specific (e.g. `URIEncoding="UTF-8"`).
+  1. Set the `useBodyEncodingForURI` attribute on the <Connector> element in server.xml
to `true`. This will cause the Connector to use the request body's encoding for GET parameters.
+ 
+ References: [[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|Tomcat 6 HTTP Connector]],
[[http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html|Tomcat 6 AJP Connector]]
+ 
+ 
+ ----
+ 
+ <<Anchor(Q3)>>'''How do I change how POST parameters are interpreted?'''
+ 
+ POST requests should specify the encoding of the parameters and values they send. Since
many clients fail to set an explicit encoding, the default is used (ISO-8859-1). In many cases
this is not the preferred interpretation so one can employ a javax.servlet.Filter to set request
encodings. Writing such a filter is trivial. Furthermore Tomcat already comes with such an
example filter. Please take a look at:
+  5.x::
+ {{{
+ webapps/servlets-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
+ webapps/jsp-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
+ }}}
+  6.x::
+ {{{
+ webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
+ }}}
+ 
+ 
+ ----
+ 
+ <<Anchor(Q8)>>'''What can you recommend to just make everything work? (How to
use UTF-8 everywhere).'''
+ 
+ Using `UTF-8` as your character encoding for everything is a safe bet. This should work
for pretty much every situation.
+ 
+ In order to completely switch to using UTF-8, you need to make the following changes:
+ 
+  1. Set {{{URIEncoding="UTF-8"}}} on your <Connector> in `server.xml`. References:
[[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|HTTP Connector]], [[http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html|AJP
Connector]].
+  1. Use a [[#Q3|character encoding filter]] with the default encoding set to UTF-8
+  1. Change all your JSPs to include charset name in their contentType.
+  For example, use {{{<%@page contentType="text/html; charset=UTF-8" %>}}} for the
usual JSP pages and {{{<jsp:directive.page contentType="text/html; charset=UTF-8" />}}}
for the pages in XML syntax (aka JSP Documents).
+  1. Change all your servlets to set the content type for responses and to include charset
name in the content type to be UTF-8.
+  Use {{{response.setContentType("text/html; charset=UTF-8")}}} or {{{response.setCharacterEncoding("UTF-8")}}}.
+  1. Change any content-generation libraries you use (Velocity, Freemarker, etc.) to use
UTF-8 and to specify UTF-8 in the content type of the responses that they generate.
+  1. Disable any valves or filters that may read request parameters before your character
encoding filter or jsp page has a chance to set the encoding to UTF-8.  For more information
see http://www.mail-archive.com/users@tomcat.apache.org/msg21117.html.
+ 
+ 
+ ----
+ 
+ <<Anchor(Q4)>>'''How can I test if my configuration will work correctly?'''
+ 
+ The following sample JSP should work on a clean Tomcat install for any input. If you set
the URIEncoding="UTF-8" on the connector, it will also work with method="GET".
+ {{{
+ <%@ page contentType="text/html; charset=UTF-8" %>
+ <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+ <html>
+    <head>
+      <title>Character encoding test page</title>
+    </head>
+    <body>
+      <p>Data posted to this form was:
+      <%
+        request.setCharacterEncoding("UTF-8");
+        out.print(request.getParameter("mydata"));
+      %>
+ 
+      </p>
+      <form method="POST" action="index.jsp">
+        <input type="text" name="mydata">
+        <input type="submit" value="Submit" />
+        <input type="reset" value="Reset" />
+      </form>
+    </body>
+ </html>
+ }}}
+ 
+ 
+ ----
+ 
+ <<Anchor(Q6)>>'''How can I send higher characters in my HTTP headers?'''
+ 
+ You have to encode them in some way before you insert them into a header. Using url-encoding
(`%` + high byte number + low byte number) would be a good idea.
+ 
+ 
+ ----
+ 
+ === Troubleshooting ===
+ 
  <<Anchor(Q5)>>'''I'm having a problem with character encoding in Tomcat 5'''
  
  In Tomcat 5 - there have been issues reported with respect to character encoding (usually
of the the form "request.setCharacterEncoding(String) doesn't work"). Odds are, its not a
bug. Before filing a bug report, see these bug reports as well as any bug reports linked to
these bug reports:

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message