httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Ramshaw" <>
Subject Re: [1.3 PATCH/QUESTION] Win32 ap_os_is_filename_valid()
Date Thu, 14 Mar 2002 20:08:01 GMT
Okay, let me kick some ideas out (I'm not a C guy, so they are pretty much
all I have to contribute - with the exception of the 'ab' patch that noone seems
to like - too bad, because there are practically no inexpensive benchmarking
tools out there, and 'ab' on Solaris looked like it could become the gold

As well, I am new to the community process, so bear with me if I am stepping
on any toes here. I am speaking of apache 1.3 here, as apache 2.0 will have
wide-character support.

These thoughts will probably result in flames, which will probably be deserved.

I actually have spent some time finding out about Microsoft Asian language
support, which was not easy, as the kremlin does not divulge useful details
very easily. I guess they just couldn't ignore a potential market that big.

Anyway, with Windows 2000/XP they have standardized on Unicode. As usual
it is their own personal brand of Unicode, so their code pages should be used
when trying to decipher it, but this is a big improvement over the previous
where different versions of Windows stored information in different encodings.
This has made Windows 2000/XP the de-facto standard for the Asian market,
and had the added plus that MS-format files are globally readable within the
Windows 2000/XP family. As well, they have good IME's and full fonts available,
which makes IE the de-facto browser standard for the Asian market.
Believe me, it pains me to say good things about the evil empire, but their
Unicode support in Windows 2000/XP is impressive.

So for a version of Windows that doesn't use Unicode, everything will be fine
as long as only English is used. Lowercase bytes will be shifted to uppercase
(or vice-versa) and everything will  be fine. Special characters (such as those
normally used for special purposes in *nix, such as the pipe) may cause some
minor problems. It might be a good idea to document how these bytes get
shifted, and what special characters may not be used in file/directory names
(I generally favour simplicity in my documentation).

For wide-characters, things will be more tricky. If only English is used, things
will probably be okay (I'm not sure about the null padding character). Case
shifting will have no unexpected side effects. This isn't true if a non-English
language is used, as a byte that 'looked' like a byte that should be
will actually get mangled.

The solution I took when I was hacking the 'strings' utility was to ignore byte
order characters (ffff, ffef, feff) as Windows can be relied upon to be little-
endian. And then I threw away all of the padding characters (note that this
is only an option for English).

When I tried to submit this change, it turned out that someone had already
submitted a much more elegant and functional change to handle wide characters
(like I said, I'm not a C guy). Amongst other things, it handled big-endian byte
orderings as well as little-endian. You could have a look at the latest version
the 'Strings' code if you wanted some ideas on how to handle wide characters.

Anyhow, that's all I have to say on this subject. I promise to say no more, so
any flames.


View raw message