www-apache-bugdb mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dirk-Willem van Gulik <dirk.vangu...@jrc.it>
Subject Re: general/4280: Can not handle the proxy request for chinese directories or files
Date Thu, 22 Apr 1999 08:24:21 GMT


Minghua Chen wrote:
> 
> > Thanks a lot for all those details; it really rules out a few things. Unfortunately
I
> > cannot access the link you gave me; I get a 502 or 500 error (both on the FTP and/or
> > HTTP port).
> >
> > We have done much the same with KOI, Big5 and UTF8 and I can vouch that it should
work.
> >
> > Just a few questions;
> >
> >       Could you outline a few ports/IP addresses; i.e which machine fetches
> >       what over what port/protocol ? And let me know the URLs so I can
> >       have a look ?
> >
> >       What charset are you using; and if you are using a non-8 bit set, do
> >       you use something like UTF6, UTF7, UTF8 or PORB to encode things ?
> >
> >       And have you set anything like special mime types, or are using things
> >       like MIME/Base64pm to encode on the fly ?
> >
> > Thanks !
> >
> > Dw.
> >
> 
> Sure.
> 166.111.66.199 fetches some Chinese files on 166.111.4.80 over port 80 on
> 166.111.64.132. The 166.111.4.80 is a ftp testing site on C(hina)
> E(ducation) (and) R(esearch) NET(work), so perhaps it can not be access
> from outside of the CERNET. :( And I have not found another site like it.
> I will try to found one.
> 
> The charset I used is "ch-cn" ( in IE5.0) and gb2312 (in Netscape). I use
> UTF8 in IE5.0.
> 
> I am not sure whether the browser or the proxy server has used the MIME or
> something else to encode on the fly. Would you tell me how to know that?
> 
Well, I cannot reach the 66.199, but looking at the web server on 4.80 I notice
that it does not send out a language, encoding or charset.

So without access to your server; or without a look at the configuration's I
cannot do much, except some general observations:

When apache proxies for HTTP it does not add any headers for language, charset
or encoding. For 'ftp' this is a different story; it relies on the mod_mime
(find_ct) to set some MIME values. There are guessed, usually from the mime.types
file or from the AddType values.

So my guess is that as you are not setting charset and encoding at the source
for HTTP, or in mod_mime for ftp, the browser does not know what to do.

Remember that it is absolutely essential to set the charset and the language
when you are sending out non-english/non-latin1 text in HTTP. 

Try setting a DefaultType or the .htm line in mime.types into something like 

	DefaultLanguage	zh
	DefaultType text/html;charset=utf8

And do not forget that in the mime.types files it is valid to specifiy things like

	text/plain;charset=utf8		.txt .TXT

Same is true for 'AddType'.

For 'ftp' you will always 'loose' as there is no real MIME coming from the
origin server. The web server has to guess. By putting carefully things into
the mime.types file you can usually get things close to be right. Especially
when also setting sensible default settings.

A quick check to see if things are right is to do:

% telnet 166.111.4.80 80
Trying 166.111.4.80...
Connected to oans.cic.tsinghua.edu.cn.
Escape character is '^]'.
HEAD / HTTP/1.0<cr>
<cr>
HTTP/1.0 200 OK
Server: Microsoft-IIS/3.0
Date: Thu, 22 Apr 1999 07:48:09 GMT
Content-Type: text/html
Content-Length: 3590

Connection closed by foreign host.
%

Where '<cr>' is just pressing the return key. In the above; you did not
get back _any_ information on the charset, encoding or language of the 
resource. The HTTP protocol says that if you do not know it you can
assume that it is latin1 (and english as a defacto case). This means
that your browser will get confused, or do random things if it then
detects utf8 or koi in the text. 

Instead it should say something like:

% telnet cils.ceo.org
Trying 10.0.0.9
Connected to cils.ceo.org.
Escape character is '^]'.
HEAD / HTTP/1.0<cr>
<cr>
HTTP/1.0 200 OK
Server: Apache/1.3.6 - dirkx/i18n
Date: Thu, 22 Apr 1999 07:49:19 GMT
Last-Modified: 22 Apr 1999 07:49:19 GMT
Content-Type: text/html; charset=utf8
Content-Length: 7581
Content-Language: zh

Connection closed by foreign host.
%

I.e a charset and a language. Now your browser can render things proper.
Have a look at http://cet.middlebury.edu/Smitheram/charset/ for some examples.

Dw

Mime
View raw message