httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William A. Rowe, Jr." <>
Subject Re: [1.3 PATCH/QUESTION] Win32 ap_os_is_filename_valid()
Date Thu, 14 Mar 2002 06:49:36 GMT
At 12:29 AM 3/14/2002, you wrote:
> >Apache 1.3 on Win32 assumes that the names of files served are
> >comprised solely of characters from character sets which are a superset
> >of ASCII, such as UTF-8 or ISO-8859-1.
>Umm, I assume that ASCII as you refer to it is its 7-bit incarnation.
>Note that _all_ character sets are supersets of 7-bit ASCII, and most
>are supersets of 8-bit ASCII (the exceptions being the various other
>'latin' encodings - i.e. ISO-8859-2 through ISO-8859-16 which differ
>in the various 'special' characters).
>This has the lovely side-effect that English is always an option,
>regardless of the actual encoding being used.

Uhmm... you are only partially correct.

Yes - 7-bit ASCII exists in nearly all character sets unblemished,
were it not for multibyte encodings.

Some encodings are 7-bit clean, that is, their other characters do
not map into 0x00-0x7F.  Examples are utf-8 and most European
encodings.  Counterexamples, however, include many Asian sets
including shift-JIS where the 0x00-0x7F alternate meanings between
their ASCII encoding and shifted-state bytes.  The user of the
Chinese character set who first commented on this in bugs ran into
exactly this problem in certain shifted character combinations.

>I think you've missed the boat on this one. Asian versions of Windows will
>all probably use characters that you don't consider as ASCII (i.e. they will
>be wide - actually Microsoft have done a pretty good job of this).

No... Jeff didn't miss anything.  Not only is this an issue with
unclean 7-bit encodings, but the 8-bit encodings are not normalized
correctly on Win32, and Win32 is case insensitive.  Essentially,
any Files or Directory blocks they use to protect file paths that
include 8-bit characters don't even map correctly for Windows-1252
or OEM-437.  Those are the sad but accurate facts.

For tolower/toupper/strcasecmp, I will have a patch sometime this
month to trust utf-8 and normalize appropriately, using the Win32
API which gives us some greater assurance that the mappings
correspond to filename processing semantics.  For 2.0, of course.

View raw message