httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Trawick <traw...@attglobal.net>
Subject Re: [1.3 PATCH/QUESTION] Win32 ap_os_is_filename_valid()
Date Wed, 13 Mar 2002 22:50:49 GMT
"Roy T. Fielding" <fielding@apache.org> writes:

> On Wed, Mar 13, 2002 at 02:12:18PM -0500, Jeff Trawick wrote:
> > Jeff Trawick <trawick@attglobal.net> writes:
> > 
> > > This function is checking for several characters which, at least in
> > > ASCII, are supposedly not valid characters for filenames.  But some of
> > > these same characters can appear in valid non-ASCII filenames, and the
> > > logic to check for these characters breaks Apache's ability to serve
> > > those files.
> > > 
> > > A user reported the inability to request a file with the Chinese
> > > character %b5%7c in the name.  The %7c byte tripped up the check for
> > > invalid ASCII characters.
> > 
> > I think this is an accurate statement regarding the use of non-ASCII
> > characters in filenames with Apache 1.3 on Win32.  Comments?
> > 
> > -------------------cut here------------------
> > Names of file-based resources with Apache 1.3 on Win32
> > 
> > Apache 1.3 on Win32 assumes that the names of files served are comprised 
> > solely of characters from the US-ASCII character set.  It has no logic to
> > determine whether or not a possible file name contains invalid non-ASCII
> > characters.  It has no logic to properly match actual non-ASCII file names 
> > with names specified in the Apache configuration file.  Because Apache
> > does not verify that the characters in file names are all ASCII, files
> > files containing various non-ASCII characters in their names can be 
> > successfully served by Apache.  However, this is not recommended for the
> > following reasons:
> 
> No, it doesn't.  It treats all names as raw bytes, regardless of charset,
> but the filtering process of preventing some filesystem-specific magic
> characters from creating security holes on a server prevents the use
> of unfiltered 16-bit Unicode or similar wide character sets from being used
> directly.  This is true in general for the Web -- wide character encodings
> are not allowed to appear in URI under any circumstances.
> 
> The solution is to use UTF-8 encoding for non-ASCII characters and not
> allow any access via wide character function calls.

Thanks a bunch for your response.  I'm more than a little unclear on this
stuff.

Regarding your key comment "treats all file names as raw bytes,
regardless of charset"...  

I would agree with that for Unix, but on Win32, in an attempt to match
the semantics of the native filesystem (case preserving but not case
significant), Apache will perform case transformations on file names*.
This, along with the filtering code to check for specific ASCII
values, is why I claimed that it assumes ASCII.

*see ap_os_canonical_filename(), which is used to generate r->filename

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

Mime
View raw message