httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: [users@httpd] Wrong charset convert
Date Wed, 01 Jul 2009 15:02:22 GMT
Jiří Eichler wrote:
...
Hi.
I do not know the answer precisely either.
But I know enough to tell you that in such matters, you must be 
/extremely/ careful in interpreting what is really going on, at each level.
Just as a stupid example : when you look at a log file, you must know : 
- has the process that writes that logfile transformed the data into 
some encoding already, when writing it to the logfile ?
- is the editor which I am using aware of the logfile encoding ?
etc...
Because otherwise what you see, and what is really there, may be 
different things.

For example, I think I remember that, internally, in the Windows NTFS 
filesystem, file names are stored as Unicode (not necessarily UTF-8, it 
could also be UTF-16 or another Unicode encoding).
(See for example here : http://www.ntfsrecovery.com/a-ntfs.php)
But when you look at a directory through the Explorer, these internal 
filenames /may/ get transformed according to your PC's codepage, just to 
display it to you.
So what you think you see, is not necessarily what is really there.
Understand what I'm saying ?

Just some elements :
- Apache should not "translate" or "encode" the received URL, because 
basically it does not know if this URL is in UTF-8, ASCII, or any other 
encoding. There is no "flag" or "header" in a HTTP request, that says in 
which encoding the "GET" line comes in. (e.g. it may also be some 
Japanese or Chinese encoding).
So it /must/ take it as bytes.
- then Apache calls the OS to find the file.  There may, or may not, be 
some translation there, I really don't know.  It may depend on what API 
call the program uses to read the directory, and I don't know what 
Apache uses.
- it's the same for your C program.  I don't know if the OpenFile() call 
interprets "name" as a pure byte sequence, or if it converts it 
internally, or whatever.
- and we don't know if Apache and your program use the same API calls.

For example, in Java or Perl, there are different ways to open a file 
and to read/write from it, some with encoding/decoding going on, some 
not.  Unfortunately, I am incompetent in C and Windows API, so I don't 
know in that case.

Obviously something is happening somewhere, and obviously it happens 
differently under Unix and under Windows.

Under Unix/Linux, most of these things are influenced by the "locale" 
under which the process is running.  Under Windows, it is usually the 
whole system-wide "International settings" which count.

I think we need an Apache/Windows developer here, to really tell us what 
is going on.


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message