tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gregor Schneider <>
Subject Re: Tomcat 5 and UTF-8
Date Thu, 02 Apr 2009 17:54:03 GMT
On Thu, Apr 2, 2009 at 7:30 PM, Je suis la poubelle <> wrote:
> On Fri, Mar 27, 2009 at 5:34 PM, Christopher Schultz <
>> wrote:
> Setting charset/encoding is to specify computerized information.  It's
> not just a matter of language.  If setting charset in META tag doesn't mean
> anything to you, the same argument applies to setting charset in HTTP
> header.

Well, this is the only argument I can agree upon.

But encoding of HTML/XML is the story of which was there first: The
hen or the egg?

I'll give you an example based on our dreadful experiences with XML-parsing:

Let's say, we have a stream looking like this:

<?xml version="1.0" encoding="UTF-8"?>

However, the encoding of the whole stream is done in some wierd
encoding you've never heard about.

See, the parser needs to know about the encoding /in advance/ to be
able to read the encoding from said stream.

See the point?

Actually, it's a good practice to put the encoding, but that's about
it, and same goes for a META-TAG.

Talking web, the only thing a parser can rely on is a HTTP-Header.

And it's getting really nuts, when it comes to UTF-8: Talking about
UTF-8 with or without BOM? Even the specs are not clear about that.

In my oppinion, the whole character-set is a pain in the ass:

I personally wish IETF came up with some specs saying something like

"the first n bytes of any stream have to be encoded in ASCII containg
length and encoding-type of the rest of the stream".

I put that on my whishlist for xmas.


just because your paranoid, doesn't mean they're not after you...
gpgp-fp: 79A84FA526807026795E4209D3B3FE028B3170B2
gpgp-key available

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message