tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Migrating to tomcat 6 gives formatted currency amounts problem
Date Fri, 12 Sep 2008 17:21:14 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johnny,

Johnny Kewl wrote:
> Use this function....
> 
> System.out.print("CharSet : " + Charset.defaultCharset().toString());
> 
> and thats what you HAVE TO set your page at....
> 
> On my system it tells me its..... windows-1252

I think you're still missing something: the file on the disk has an
implicit file encoding that is not advertised in any way. This is the
core of the problem.

If all text files said "hey, I'm encoded in UTF-8" or "I'm in
ISO-8859-1" or "This file is WINDOWS-1252", then there would be no
problem: all code would use the native encoding of the file as the
encoding of the HTTP response, and the file would be streamed as binary
without changing a single bit in the stream.

Unfortunately, this is better known as "explicit encoding" and basically
doesn't exist (except in some UTF-encoded files). Since the server
doesn't know the file's original encoding, it /can never make a sensible
decision about the output encoding/. It's simply not possible.

It has nothing to do with your OS, of your filesystem, or your per-user
locale preferences, installed fonts, etc. It has to do with the fact
that the file has no explicit encoding and the server can use. (This is
what gives rise to the MSIE practice of sniffing the document content
regardless of the server's assertion as to the character encoding).

> ... it a headache... rather refactor your code so the pages are all the
> same charset of your choosing and work with &pound, &yen &dollar....

This is always a sensible way to go. If you stick to pages that always
use US-ASCII or anything compatible with it (generally ISO-8859-*, I
think), you'll be good to go.

A much better way to go is to always use properties files for text that
will be displayed on web pages. It's the right thing to do from a
localization perspective (yes, you can have separate pages for each
language, but that's no fun), AND the encoding for Java properties files
is DEFINED TO BE ISO-8859-1, no matter what you want to put in there. In
this case, there /is/ an explicit character encoding, and it's
predictable. Of course, Java coders can always bone the creation of
these files...

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKpQoACgkQ9CaO5/Lv0PDW4ACdEHqsgCK2IrHF1Bl6cz40Wben
liYAn00FVbmPpVAl35Zh6nDd1Q5Cxh/d
=4lJ4
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message