httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonas Eckerman <jonas_li...@frukt.org>
Subject Re: [users@httpd] Proxy garbles "special" characters
Date Mon, 06 Oct 2003 22:07:57 GMT
On Mon, 06 Oct 2003 23:33:22 +0200, Peter Bissmire wrote:

>  (&eacute;) is rendered as 'é'
[..
> www.granddictionnaire.com//btml/fra/r_motclef/fiche_accueil.asp

I fetched that document myself. You're right in that it the mangled 
characters are double byte characters. Quextion is: Why do they get 
mangled?

The mangling you descibed is exactly the effect you get if you view 
that document as though it is in ISO-8859-1 or maybe Microsofts very 
similar Latin-1 code page.

So, it seems the characters does not get magled, split, or anything 
like that in the tra nsfer. They are tramsfered perfectly fine. But 
the charset info goes bad.

> Possibly relevant html header is a meta, "content = "text/html;
> charset = UTF-8".

Yes, that's certainly relevant. And that's probably the charset your 
browser should intepret the document as being in.

I take it that that header is present in the document regardles of 
wether you fetch the document directly or through the proxy?


> Telnetting yields following results.

Another way to see the headers is to use Mozilla (or Mozilla 
Firebird) with the Live HTTP headers plugin. Or use wget with the -s 
switch to get the headers saved with the file.

Both this options have the advantage that you don't have to manually 
type in a valid HTTP command. :-)

> >From my server:
> HTTP/1.1 400 Bad Request
[...]
> Content-Type: text/html; charset=iso-8859-1

One problem here is that you sent a bad request. This means that the 
"document" you got was your servers error document. It does not show 
what your server does when you succesfully (apart from this charset 
problem) use it as a proxy.

If your proxy server does add that header when acting as a proxy, it 
would explain why the characters get mangled.

> So my server mentions a charset,
> IIS does not.
> I see no other significant difference

Does your server mention a charset when fetching a document from 
another server (when used as a proxy), even if that server doesn't?

If both servers mentions a charset, do they specify the same charset?

> But what does this mean?

If your server adds or change the HTTP charset header when acting as 
a proxy, it means the server is doing something it shouldn't do and 
gives as a clue on where to continue the troubleshooting.

I forgot: are you using Apache 1 or 2?

Regards
/Jonas

-- 
Jonas Eckerman, jonas_lists@frukt.org
http://www.fsdb.org/



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message