From Steve Hay <>
Subject [OT] Character encodings in web pages
Date Fri, 09 May 2003 08:19:12 GMT
[Apologies if this question is slightly off-topic, but I don't really 
know where else to post to.]

I have a problem reported by a German customer trying to view some web 
pages (mod_perl output) containing characters with diacritical markings. 
 The page views OK except for all such special characters, which appear 
garbled, e.g. a lowercase letter "a" with an umlaut (U+00E4) appears as 
an uppercase letter "A" with a tilde (U+00C3) followed by a currency 
sign (U+00A4).

The text concerned is read from a message file by the mod_perl content 
generator.  The web page that is it copied to is sent with the following 
two HTTP headers (amongst others):

Content-Type: text/html; charset=ISO-8859-1
Content-Language: de-DE

and, for good measure, also contains the following two <META> elements:

<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
<meta HTTP-EQUIV="Content-Language" CONTENT="de-DE">

(If I run the message file concerned through "od -c" then the lowercase 
"a" umlaut is listed as "344".)

I can't reproduce the problem myself: I've tried viewing the web page 
concerned in both NS7 and IE6, and the special characters all appear 
correctly.  (I have the whole setup (server & client) running on an 
English Windows system; the customer's is running on a German Windows 

Does anybody have any ideas what could be wrong, or suggestions of where 
else to ask?

I wonder if the fact that characters are apparently all rendered as a 
PAIR of characters, in which the first character is apparently always 
uppercase "A" tilde, is significant?

Or is ISO-8859-1 simply the wrong character set for German?

Can any Germany speakers / HTTP gurus shed any light on this?



