tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andre E. Bar'yudin" <baryu...@pob.huji.ac.il>
Subject Re: How to UTF-8 your site.
Date Tue, 10 Jun 2003 12:41:49 GMT
Quoting Nikola Milutinovic <Nikola.Milutinovic@ev.co.yu>:

> > > 1.
> > > JSP pages must inlcude the header:
> > > 
> > > <%@ page
> > >  contentType="text/html; charset=UTF-8"
> > > %>
> > 
> > This is if you use JSP.  If you work with servlets, then you should output
> the
> > appropriate headers.
> 
> Actually, it also sets encoding of the output stream. Or at least it used to
> in some versions of Tomcat. The full declaration would look like this:
> 
> <%@ page
>   contentType="text/html; charset=UTF-8"
>   pageEncoding="UTF-8"
>   import="..."
>   info="..."
> %> 

Well, it is setting the encoding of the output stream.  The UTF-8
interpretantion of the source JSP was forced by the -Dfile.encoding directive.

> 
> > > 2.
> > > In the Catalina.bat (windows) catalina.sh (windows)
> apache$jakarta_config.com
> > > (OpenVMS), file there must be a switch added to the call to java.exe. 
> The
> > > switch is:
> > > 
> > > -Dfile.encoding=UTF-8
> > > 
> > > I cannot find documentation for this environment variable anywhere or
> what it
> > > actually does but it is essential.
> > 
> > It's not Tomcat-specific, tt should be probably somewhere in Java
> specifications.
> 
> Java/Tomcat should be independant of local settings and encodings. Each JSP
> carries sufficient information on it's static (pageEncoding) and output
> encoding (contentType). Servlets have to specify this explicitely (set
> "Content-type:" header in the ServletResponse and encoding of the output
> stream).
> 
> Resource files are a different story, again servlets have to set their
> encoding manually. Again, it should not be a global setting of the JVM.

JVM, just like any other program, is influenced by the environment which it runs
in, such as regional settings in Windows, locate in Unix and other stuff. 
Sometimes these settings need to be overridden.

> > > 3.
> > > For translation of inputs coming back from the browser there must be a
> method
> > > that translates from the browser's ISO-8859-1 to UTF-8.  It seems to me
> that
> > > -1 is used in all regions as I have had people in countries such as
> Greece &
> > > Bulgaria test this and they always send input back in -1 encoding.  The
> > > method which you will use constantly should go something like this:
> > 
> > I wonder why you need this.  I have no need to convert anything into UTF-8
> by
> > hand - Tomcat does it for me (and I work not only with European languages).
>  My
> > code includes the following line:
> > 
> > req.setCharacterEncoding("UTF-8");
> > 
> > and everything works OK with IE and Mozilla.
> 
> Yup. One additional word of warning - browsers should be able to support
> multiple client encodings (Windows - switching from one keyboard to another).
> And they should be able to tell server which encoding was used for the data -
> HTTP/HTML have support for this. The problem is most browsers ignore this, so
> you'll have to assume that the data was encoded using some fixed encoding.
> The problem is present in my country - we use both cyrilic and latin
> alphabets. If the page is designated to be windows-1250 encoded (latin) and
> user enters data using cyrilic keyboard with windows-1251, server will have
> no way of knowing this. Servlet author is forced to assume that the encoding
> is CP1250 and it will be wrong.

I'm not sure I understand what you mean here.  If you're trying to say, that by
swithcing from Cyrillic to English keyboard layout influences the browser
request's charset, I think you're wrong.

Most browsers encode the request information (we are talking about forms, don't
we) in the same charset as that of the original page, so if that charset
supports all the languages of the user input (like cp1251 which can handle both
English and Cyrillic), everything will be ok.

I actually had no problem to fill out forms simultaneously in English, Russian,
Hebrew and German - and it is interpreted correctly by my application (UTF-8 is
used).

Regards,

Andre.

-- 
=============================================================
Andre E. Bar'yudin
Home page:  http://www.cs.huji.ac.il/~baryudin/


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


Mime
View raw message