tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Tomcat Wiki] Update of "Tomcat/UTF-8" by KonstantinKolinko
Date Sun, 28 Mar 2010 14:21:51 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tomcat Wiki" for change notification.

The "Tomcat/UTF-8" page has been changed by KonstantinKolinko.
The comment on this change is: Removed all content of the page. The up-to-date version of
all this is in FAQ/CharacterEncoding..
http://wiki.apache.org/tomcat/Tomcat/UTF-8?action=diff&rev1=13&rev2=14

--------------------------------------------------

+ This page is obsolete. See [[FAQ/CharacterEncoding|FAQ/CharacterEncoding]] for the up-to-date
version.
- 1.
- JSP pages must include the header:
  
+ ----
+ CategoryObsolete
- {{{ <%@ page
-  contentType="text/html; charset=UTF-8"
- %> }}}
  
- 2.
- For translation of inputs coming back from the browser there must be a
- method that translates from the browser's ISO-8859-1 to UTF-8.  ISO-8859-1
- is the default character encoding for servers and browsers according to the
- [[http://www.ietf.org/rfc/rfc2616.txt|HTTP specification]] section 3.4.1.
- 
- {{{  /**
-   * Convert ISO-8859-1 format string (which is the default sent by IE
-   * to the UTF-8 format that the database is in.
-   */
-  public String toUTF8(String isoString)
-  {
-   String utf8String = null;
-   if (null != isoString && !isoString.equals(""))
-   {
-    try
-    {
-     byte[] stringBytesISO = isoString.getBytes("ISO-8859-1");
-     utf8String = new String(stringBytesISO, "UTF-8");
-    }
-    catch(UnsupportedEncodingException e)
-    {
-     throw new RuntimeException(e);
-    }
-   }
-   else
-   {
-    utf8String = isoString;
-   }
-   return utf8String;
-  } }}}
- I have found that these three steps are all that is necessary to make your
- site accept any language that UTF-8 can work with.  I extend my thanks to
- those of you on the Tomcat users list who helped me find these little gems.
- 
- (from the tomcat-user mailing list) 
- 
- '''Note''' This method is not useful because it doesn't work with non-ASCII character. "stringBytesISO"
is an ISO-8859-1 byte stream. We can't use it as an UTF-8 byte stream if it contains non-ASCII
character.
- 
- '''Alternative solution'''
- 
- The solution suggested above works, but from the architecture perspective the correct way
is to add a filter to the Tomcat that will do necessary correction for the application deployed
without any additional changes to the rest of the code.
- 
- 1. Make sure JSP header is set as suggested:
- {{{
- <%@ page contentType="text/html; charset=UTF-8"%>
- }}}
- 
- 2. Example of filter:
- 
- {{{import java.io.*;
- import java.util.*;
- import javax.servlet.*;
- import javax.servlet.http.*;
- 
- public class CharsetFilter implements Filter
- {
-  private String encoding;
- 
-  public void init(FilterConfig config) throws ServletException
-  {
-   encoding = config.getInitParameter("requestEncoding");
- 
-   if( encoding==null ) encoding="UTF-8";
-  }
- 
-  public void doFilter(ServletRequest request, ServletResponse response, FilterChain next)
-  throws IOException, ServletException
-  {
-   // Respect the client-specified character encoding
-   // (see HTTP specification section 3.4.1)
-   if(null == request.getCharacterEncoding())
-     request.setCharacterEncoding(encoding);
- 
-   next.doFilter(request, response);
-  }
- 
-  public void destroy(){}
- }
- }}}
- 
- Corresponding portion of web.xml configuration will look like:
- 
- {{{  <!--CharsetFilter start-->
- 
-   <filter>
-     <filter-name>Charset Filter</filter-name>
-     <filter-class>CharsetFilter</filter-class>
-       <init-param>
-         <param-name>requestEncoding</param-name>
-         <param-value>UTF-8</param-value>
-       </init-param>
-   </filter>
- 
-   <filter-mapping>
-     <filter-name>Charset Filter</filter-name>
-     <url-pattern>/*</url-pattern>
-   </filter-mapping>
- 
-   <!--CharsetFilter end-->}}}
- 
- The suggested solution originates from [[http://people.comita.spb.ru/users/sergeya/java/ruschars.html|Sergey
Astakhov (all texts are in russian)]] (sergeya@comita.spb.ru)
- 
- '''Important note''': Note that this filter should be as far towards the front of your filter
chain as possible. If some other code calls request.getParameter (or a similar method) before
this filter is invoked, then the encoding will not be set properly, and your parameters will
still be decoded improperly.
- 
- '''- TIP -'''
- 
- Update the file $CATALINA_HOME/conf/server.xml for UTF-8 support by connectors.
- Example:
- 
- {{{<Connector port="8080"
-            URIEncoding="UTF-8"/>}}}
- 
- or
- 
- {{{<Connector port="8080"
-            useBodyEncodingForURI="true"/>}}}
- 
-  * ''URIEncoding'' specifies the character encoding used to decode the URI.
-  * ''useBodyEncodingForURI'' indicates whether to use the encoding specified in contentType
(or explicitly set using Request.setCharacterEncoding() method) to decode the URI query parameters.
The default value is set to "false".
- 
- '''Note''' that this changes the behavior of reading GET parameters from the request URI
and will not affect POST parameters at all.
- 
- == See Also ==
-  * http://wiki.apache.org/tomcat/Tomcat/UTF-8
-  * http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/
- 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message