tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Migrating to tomcat 6 gives formatted currency amounts problem
Date Fri, 12 Sep 2008 16:33:12 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johnny,

Johnny Kewl wrote:
> If this locale stuff is in fact defaulting to an ISO char set that can
> do these symbols... and say you where making a non english page, say
> Japanese... do you think that its possible to use it?

It is up to your browser to choose a font that is appropriate for all
glyphs (that is, a graphical representation of a code point) that need
to be displayed. Some fonts do not support all codepoints because they
don't have all the glyphs. For instance, if you have a string in English
and also Sanskrit, your browser is likely to display one string in one
font (maybe Arial) and the other in another font (say, Sanskrit).

Let's say that the browser comes across the &pound; entity. &pound; maps
directly to 8-bit hex character code 0xa3
(http://htmlhelp.com/reference/html40/entities/latin1.html). Whether you
put &pound; or £ in your HTML, the browser should render it properly --
possibly switching fonts to one that supports that code point for that
character only.

The problem with your page is not that the £ symbol is not available in
the font the browser chose. Your problem is that you illegally encoded
it into the page in the first place (or, equivalently, you advertise the
wrong encoding for the page, which is really the same thing).

If you re-write your page to declare some <font> around that symbol, you
will never be able to get it to work, unless you use the browser to
override the server-declared encoding (as Chuck did, when things render
properly when using ISO-8859-1).

> I've actually now seen examples on the web that are doing it Wil's way,
> they using the getCurrencyInstance to make the currency symbols.

Use of Java's built-in currency-symbol-generating methods are likely to
produce a proper £ symbol. If you have your encoding chain set up
properly, it should go from NumberFormat.format() straight to your web
page without a hint of difficulty.

> But I'm thinking its a US/Eng only methodology... when applied to a web
> page.
> Do you think using getCurrencyInstance is generalizable in other languages?

Absolutely. The only reason $ is a magic symbol is because it's part of
US-ASCII and low enough in the symbol table so that it never gets
screwed up by incorrect encodings. Symbols like £ or € do not share that
luxury and are therefore error-prone when administrators poorly
configure their servers. It's further compounded by the fact that many
English-specking coders forget that there are other people in the world. :(

> When you say.... "If I override that with say ISO-8859-15", is that the
> whole page you talking about, or it possible to have different character
> encoding sections in a web page.... thats another area thats confusing
> me now, because if I do look at that test page in a MS tool... it
> displays correctly with mixed encodings?

The encoding is for the entire document, not just a single character.
basically, you sent an illegal character code. It would be like sending
6 bits of an 8-bit byte. In fact, that's /exactly/ what you did because,
to a UTF-8 renderer, your set of 8 bits looks like there should be
something else /before/ it in order to make it legal. Your server said
"hey, client... I'm gonna send you a bunch of oranges" and then went
right ahead and sent apples mixed-in with those oranges.

> But when you choose a font in a text editor like Swing or Word, you are
> also picking some character set... and thats whats been injected into
> the page as its been formed...

Yes and no. Many encodings are limited by a particular character set
(for instance, US-ASCII is never going to have Sanskrit letters in it).
But that'd why Unicode was invented: to make sure that anything we'd
ever possibly want to show on the screen is possible because we have
enough bits to display it. (My understanding is that Unicode (16-bit) is
actually not big enough for everything, but hey, they tried). The beauty
of UTF-8 is that every character you'd want to display has its own code
that nobody can steal -- regardless of the font being used.

The lesson is to always use UTF-8 and make sure you actually have
everything working properly. If your server is saying "utf-8" but the
character encoding on your servlet Writer is actually "ISO-8859-1" then
you haven't done your job and your web pages are going to look broken
when non-latin characters are thrown in there. The same is true if you
are serving static content (as I suspect you are in your example) and
advertising that it is "utf-8" but the file was written with ISO-8859-1
(or something else). (In your case, the problem is that text files
contain no explicit encoding information in them, so the server has to
guess -- or, more likely, there's no guessing going on, and the server
just blindly uses whatever its default has been configured to be.)

> I screw up terminology... ok we all know that.... but
> Does Wil need to worry about the way he is doing it?... thats all I'm
> asking... I think so...

The short answer is no: Wil does not need to worry. If his code is
generating a proper € or £ then, as long as the server isn't lying
abound the encoding, everything will be fine.

Unless the browser sucks. ;)

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKmcgACgkQ9CaO5/Lv0PAIVACfT+P6XVbLFDngXT6+C5jEzAQ8
TXUAoKVtwsaijbpdfTY9mEISD7G4Ho+t
=35Pr
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message