httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William A. Rowe, Jr." <>
Subject Re: [1.3 PATCH/QUESTION] Win32 ap_os_is_filename_valid()
Date Thu, 14 Mar 2002 03:22:04 GMT
At 04:50 PM 3/13/2002, you wrote:
>"Roy T. Fielding" <> writes:
> > On Wed, Mar 13, 2002 at 02:12:18PM -0500, Jeff Trawick wrote:
> > > Jeff Trawick <> writes:
> > >
> > > I think this is an accurate statement regarding the use of non-ASCII
> > > characters in filenames with Apache 1.3 on Win32.  Comments?
> > >
> > > -------------------cut here------------------
> > > Names of file-based resources with Apache 1.3 on Win32
> > >
> > > Apache 1.3 on Win32 assumes that the names of files served are comprised
> > > solely of characters from the US-ASCII character set.  It has no logic to
> > > determine whether or not a possible file name contains invalid non-ASCII
> > > characters.  It has no logic to properly match actual non-ASCII file 
> names
> > > with names specified in the Apache configuration file.  Because Apache
> > > does not verify that the characters in file names are all ASCII, files
> > > files containing various non-ASCII characters in their names can be
> > > successfully served by Apache.  However, this is not recommended for the
> > > following reasons:
> >
> > No, it doesn't.  It treats all names as raw bytes, regardless of charset,
> > but the filtering process of preventing some filesystem-specific magic
> > characters from creating security holes on a server prevents the use
> > of unfiltered 16-bit Unicode or similar wide character sets from being used
> > directly.  This is true in general for the Web -- wide character encodings
> > are not allowed to appear in URI under any circumstances.

Wrong.  Unix may allow any filename character excepting a '\0', but
Win32 does not.  Win32 filenames cannot be treated as 'raw bytes'.

Except that we have used utf-8 folding on -all- filename resources
through APR in 2.0.  Effectively, all APR Win32 API calls are made
in Unicode, using utf-8 -> unicode folding [strict, as opposed to Microsoft's
preference for ignoring the security impact of accepting too-many-bytes

So no Unicode filename in 2.0 is unaccessible, but local code pages
cannot be used, since they are ambiguous.  Utf-8 is unambiguous, and
therefore ideal for this application.

To Roy's other comment on a case-insensitive directive, that doesn't
work as long as different OS's and network layers perform local code
page comparisons differently, canonicalization is the only way to know
that the file you've open()ed or stat()ed matches a given file that may
or may not be protected by our directives, including that case-sense
flag buried in the <Directory> block, as suggested :-)

A user recently observed in a bugs report that Microsoft has tacked
on their favorite FEFF unicode prefix in all utf-8 text files [folded into
utf-8, of course.]  While it's BS and preferable to ignore it, I will likely
get a patch in that skips those characters when parsing httpd.conf
or .htaccess files.  This assures that security applied to utf-8 files
is good as well.

The last remaining bit of the utf-8 exercise is to eliminate the last
of the 'tolower() is a good enough comparison' and 'strcasecmp
always works.'  Even on unix, these fns are always expressed in
local code pages, and we want these to always map to utf-8
comparison.  Worse, we want them to map to system internal
mappings, so that we are on the same page as the file system
when it comes to case folding.

In 1.3, we were not on the same page, the msvcrt treats the
0x80-0xff chars differently than the file system [another security
related reason they are really unsupported.]  I'd like that issue
really closed up in 2.0.  Canonicalization has helped a ton, but
there is still a bit of ambiguity in them thar strings.


> > The solution is to use UTF-8 encoding for non-ASCII characters and not
> > allow any access via wide character function calls.
>Thanks a bunch for your response.  I'm more than a little unclear on this
>Regarding your key comment "treats all file names as raw bytes,
>regardless of charset"...
>I would agree with that for Unix, but on Win32, in an attempt to match
>the semantics of the native filesystem (case preserving but not case
>significant), Apache will perform case transformations on file names*.
>This, along with the filtering code to check for specific ASCII
>values, is why I claimed that it assumes ASCII.
>*see ap_os_canonical_filename(), which is used to generate r->filename
>Jeff Trawick |
>Born in Roswell... married an alien...

View raw message