httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Hyde <bh...@pobox.com>
Subject Proposal for URI->filename mapping doc.
Date Mon, 06 Jul 1998 13:50:46 GMT

If the implementation matched this doc would it be
safe, useful, and a good thing?  - ben

---

There are many pitfalls in configuring a web server.
In this section we discuss how the naming of file system
objects is managed.  When objects have more than one name
care must be take to assure that the same access is granted
independent of the variation in name used for the object.

Windows is particularly delicate in this reguard.  A few
examples for pairs that name the same file will illustrate
this:
  a. "foo"          "foo."
  b. "..."          "..\.."
  c. "foo::$DATA"   "foo"
  d. " foo  "       " foo" 
  e. "CaSe"         "case"

To manage these pitfalls Apache carefully normalizes the
names given in requests before checking the configured
security.  

When configuring the server it is critical that you
use the normalized name for things.  The server assumes
you provided normalized names in the configuration
and uses case sensitive string comparison for much
of it's checking.

The URI to internal name mapping proceeds as follows.

 - Prepend the DocumentRoot.

   The resource identifier given in the request and the 
   Document Root are appended.

 - All % encoding is removed from the incomming URL.

   This removes the syntax known as "escape" in RFC2068
   but in maybe reintroduced in the following steps.

 - Eliminate URI containing "illegal" characters.

   The server maybe configured to prevent any
   handling of names containing certain characters.
   This set is configurable, via IllegalCharacters.
   For exaple you may want to preclude any control
   characters in incomming URI:
    IllegalCharacters %00-%20 ...etc...

   The default for IllegalCharacters is platform
   specific.  On Windows the these eight 
   characters are, by default illegal:
      \  ?   <
         "   >
      :  *   |

   These maybe enabled, but see the next section.

   For example "/<foo>/" would error if
   IllegalCharacters includes %3C for the less than
   character.

 - Troublesome characters are encoded.

   The server maybe configured to prevent certain
   characters which you want to allow in resource
   identifiers from being used in the file system.
   To keep these out of the file system they are
   encoded using the "escape" syntax of RFC2068.

     "/<%foo>/"  -becomes-> "/%3c%25foo%3e"

   The set of these characters maybe configured
   via EscapeCharactersForFileSystem.  The
   default is platform specific.

   On windows the eight characters:
      \  ?   <
         "   >
      :  *   |
   must be included in any setting of given
   to EscapeCharactersForFileSystem.

   The intent of this pass is to allow any legal
   object name to be mapped into the file system
   when necessary.  For example when migrating a
   case sensitive unix file set to a less case 
   sensitive Windows file system.

   This encoding maybe disabled entirely by directing
     EscapeCharactersForFileSystem none
   but in that case, on windows, the characters
   above must be declared illegal.

 - Normalize Filesystem Case [Window's only]

   lowercase the requested name:
    "/FOO/Bar.html"  -becomes-> "/foo/bar.html"

   A file on windows maybe denoted by any variation 
   in case (e.g. "foo", "Foo", and "FOO" all name the
   same file).  Only one variation is stored in the
   file system but we ignore this canonical 
   casification. [[In the best of all possible worlds
   the response would show this "spelling" but it
   is expensive to walk the path and tease it out
   of the file system just for that cosmetic advantage.
   Some directives might be a good idea, e.g.
    CasifyResponseURI yoyodyne YoyoDyne
    CasifyAllResponseURI yes]]

   Note that on windows you should use lowercase through
   out your configuration directives, otherwise they
   will match none of the requests.

 - Normalized Filesystem Syntax

   The following rewrites are done.
   - (On Unix (and windows?)) Multiple file slashs are removed:
       "/foo///bar//gum.txt" becomes
       "/foo/bar/gum.txt"

   - (On Windows) Trailing spaces are eliminated 
     from any step on the path.
       "\foo \ bar    \gum  " becomes
       "\foo\ bar\gum"
     
     This may be enabled on Unix as well via
     DropTrailingSpaces, but it maynot be disabled
     on Windows.

   - (On Windows) Eliminate the superfluous period on
     null extensions.
        "/.../foo./bar.a/gum" becomes
        "/.../foo/bar.a/gum

   - (On windows) Limit syntax for reaching
     ancestors, generate error.

        "/.../foo.txt" is an error.

The resulting name is use as the internal name for the object denoted.
For example this name is used in all matching against Limit, and
Directory directives.

Only names that would pass thru this process unchanged should appear
in configuration files.

When it becomes necessary to convert this name to an actual
filename the following rewrites are done.  These are done
only on Windows to create a UNC.  Apache does not ever use
the disk syntax (e.g. "C:\foo\bar.txt").

 - an additional slash is added to the front of the name.
    "/f/foo/bar/gum.html" -becomes-> "//f/foo/bar/gum.html"

 - all the slash are converted to backslash:
    "//f/foo/bar/gum.html" -becomes-> "\\f\foo\bar\gum.html"

 - an OS call is made to check that this name denotes a simple
   file or directory, and not a device or the machine, e.g.
   CON1, or NUL.

[[We need some plain talk about links.]]




Mime
View raw message