tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Tomcat Wiki] Update of "FAQ/CharacterEncoding" by ChristopherSchultz
Date Fri, 13 Nov 2009 15:17:22 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tomcat Wiki" for change notification.

The "FAQ/CharacterEncoding" page has been changed by ChristopherSchultz.
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding?action=diff&rev1=9&rev2=10

--------------------------------------------------

  
  If a character encoding is not specified, the Servlet specification requires that an encoding
of ISO-8859-1 is used. The character encoding for the body of an HTTP message (request ''or''
response) is specified in the `Content-Type` header field. An example of such a header is
`Content-Type: text/html; charset=ISO-8859-1` which explicitly states that the default (ISO-8859-1)
is being used.
  
+ References: [[http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1|HTTP 1.1 Specification,
Section 3.7.1]]
+ 
  <<Anchor(Q2)>>'''How do I change how GET parameters are interpreted?'''
  
  Tomcat will use ISO-8859-1 as the default character encoding of the entire URL, including
the query string ("GET parameters").
@@ -26, +28 @@

  
   1. Set the `URIEncoding` attribute on the <Connector> element in server.xml to something
specific (e.g. `URIEncoding="UTF-8"`).
   1. Set the `useBodyEncodingForURI` attribute on the <Connector> element in server.xml
to `true`. This will cause the Connector to use the request body's encoding for GET parameters.
+ 
+ References: [[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|Tomcat 6 HTTP Connector]],
[[http://tomcat.apache.org/tomcat-6.0-doc/config/http.html|Tomcat 6 AJP Connector]]
  
  <<Anchor(Q3)>>'''How do I change how POST parameters are interpreted?'''
  
@@ -92, +96 @@

  
   1. [[http://jcp.org/aboutJava/communityprocess/mrel/jsr154/index2.html|Java Servlet Specification
2.5]]
   1. [[http://jcp.org/aboutJava/communityprocess/final/jsr154/index.html|Java Servlet Specification
2.4]]
-  1. [[http://www.w3.org/Protocols/rfc2616/rfc2616.txt|HTTP 1.1 Protocol]]] ([[http://www.w3.org/Protocols/rfc2616/rfc2616.html|hyperlinked
version]])
+  1. [[http://www.w3.org/Protocols/rfc2616/rfc2616.txt|HTTP 1.1 Protocol]] ([[http://www.w3.org/Protocols/rfc2616/rfc2616.html|hyperlinked
version]])
   1. [[http://www.ietf.org/rfc/rfc2396.txt|URI Syntax]]
   1. [[http://www.w3.org/Protocols/rfc822/|ARPA Internet Text Messages]]
   1. [[http://www.w3.org/TR/html4|HTML 4]]
  
+ ''Default encoding for request and response bodies''
+ 
+ See 'Default Encoding for POST' below.
+ 
  ''Default encoding for GET''
  
- The character set for HTTP query strings (that's the technical term for 'GET parameters')
can be found in sections 2 and 2.1 the "URI Syntax" specification. The character set is defined
to be [[http://en.wikipedia.org/wiki/ASCII|US-ASCII]]. Any character that does not map to
US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax specification says that
characters outside of US-ASCII must be encoded using `%` escape sequences: each character
is encoded as a literal `%` followed by the two hexadecimal codes which indicate its character
code. Thus, `a` (US-ASCII character code 0x97) is equivalent to `%97`.
+ The character set for HTTP query strings (that's the technical term for 'GET parameters')
can be found in sections 2 and 2.1 the "URI Syntax" specification. The character set is defined
to be [[http://en.wikipedia.org/wiki/ASCII|US-ASCII]]. Any character that does not map to
US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax specification says that
characters outside of US-ASCII must be encoded using `%` escape sequences: each character
is encoded as a literal `%` followed by the two hexadecimal codes which indicate its character
code. Thus, `a` (US-ASCII character code 0x97) is equivalent to `%97`. There ''is no default
encoding for URIs'' specified anywhere, which is why there is a lot of confusion when it comes
to decoding these values.
  
  Some notes about the character encoding of URIs:
   1. ISO-8859-1 and ASCII are compatible for character codes 0x20 to 0x7E, so they are often
used interchangeably. Most of the web uses ISO-8859-1 as the default for query strings.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message