httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ken Parzygnat" <kp...@raleigh.ibm.com>
Subject Canonical filename overhaul
Date Wed, 14 Oct 1998 22:17:25 GMT
ATTENTION: LONG POST!!!!

Time to stir up this hornets nest again.

I've been working on Apache PR's for Win32 for
a short time, but I've found that about 90% of
the problems are boiling down to a bug in
ap_os_canonical_filename.

There are still problems lying around that
relate to this, and there are still more to 
come.  For example, the <Files> directive seems
to be completely broken on Win32.

I've spent the last few days digging through
the new-http archives, and looking at the code
to see what the problems have been, and what
the discussions have been.  I really believe
that ap_os_canonical_filename is far more
complicated than it needs to be.  Therefore,
I've decided to overhaul the entire process
and what I have so far looks like it may work. 

One thing to keep in mind is that the UNIX
ap_os_canonical_filename is a no-op.  That is,
it returns exactly what it gets.  This is 
partly why I believe the Win32 version needs
to be very simple, and only do what it minimally
needs to do to transform a path into a 
well known format.

Part of my testing involved carving out
ap_os_canonical_filename and placing it in a 
test program so I could examine its inputs and
outputs.  Here is a sample of some areas which I feel 
the routine is not behaving as it should.

Here are some of the results:
(The Labeling here is as follows:
  "Path: [path]" means 'path' was the input
  "Old : [path]" means 'path' was output from the
                 existing ap_os_canonical_filename)
                

Path: [*.perl]
Old : [f:/work/canon/debug/*.perl]

This really seems wrong to me.  ap_os_canonical_filename
has defaulted the current drive and current path.  This
is the primary reason that the <Files> directive is broken
on Win32.  In fact, the current drive and current path
are defaulted onto any string which is not a UNC name.  It
seems that we should never do this, we should only be 
getting default drive and path information from ServerRoot
or DocumentRoot.  

Path: [*.PERL]
Old : [f:/work/canon/debug/*.PERL]

This looks a lot like the last one, but you'll notice that
the case is still upper case.  This is because 
ap_os_canonical_filename could not find this file, so it
didn't change the case.  Again, this breaks cases were we
are looking for matches such as the <Files> directive.

Path: [f:\apache\htdocs\..\Welcome.html]
Old : [f:/apache/htdocs/welcome.html]

This does not seem correct to me because we have eliminated
the valid path component '..'.  We should leave this for calls
to ap_getparents.  

Path: [//Apache/HTdocs]
Old : [//Apache/HTdocs\]

Path: [//Grumble////hello//world]
Old : [//Grumble////hello\//world]

It's pretty self evident what has messed up here.

Path: [hello.html]
Old : [f:/work/canon/debug/hello.html]

Path: [f:]
Old : [f:/work/canon/debug/*]

Path: [/]
Old : [f:/]

Path: [*]
Old : [f:/work/canon/debug/*]

More assumptions of drives and paths.

Path: [f:\APACHE\HTDOCS1\Welcome.html]
Old : [f:/apache/HTDOCS1/Welcome.html]

Path: [f:\APACHE\HTDOCS\Welcome.html]
Old : [f:/apache/htdocs/welcome.html]

In the above two, you'll notice that one version
got lower cased, while the other partly did.  This
is because f:\apache\htdocs\Welcome.html exists
on my system. However, there is no f:\apache\htdocs1
directory.  Seems to me that we want to be sure
when we are lowercasing and when we are not.


Path: [.\]
Old : [ASSERT ERROR!!!!!!!]

Path: [...\]
Old : [ASSERT ERROR!!!!!!!

Path: [...\.]
Old : [ASSERT ERROR!!!!!!!]

Path: [..]
Old : [ASSERT ERROR!!!!!!!]

These are just a few of the cases that just
assert with the current routine.  The list
goes on.  Granted some are handled because
we call ap_getparents before the call to this
routine.  However, I wouldn't want to bet
that everything is caught.



OK, catch your breath....


Knowing that the Unix version of 
ap_os_canonical_filename is a no-op means that
there is lots of code in the server to do basic
path crunching (i.e. '..' and '.' and '///' etc).
So, it seems that we only need ap_os_canonical_filename
to change the path into a format 
that the existing code can deal with, and accomplish this
by doing as little as possible.  We want to try to 
assume as little as possible, because certainly
/foo/bar/welcome.html is valid on Win32.

So why not just lowercase everything and change all
'\' to '/'?  Because you still have the case issue
lurking for things like PATH_INFO.  And for PATH_INFO
you want to be sure that you preserve the case that
you are given.  Therefore, in my overhaul, I'm proposing
two functions:
ap_os_canonical_filename
ap_os_case_canonical_filename

ap_os_case_canonical_filename takes a path and formats
it without changing any case of the input components.

ap_os_canonical_filename does the same as above, but
calls strlwr on the output string.

The only place I've found that needs 
ap_os_case_canonical_filename is in directory_walk.  I'll
discuss that later.

So this is what I think ap_os_case_canonical_filename needs
to do:
1. Change '\' to '/'
2. Eliminate trailing '.' characters.  I felt this was
   important as a security issue since Win32 will ignore
   a trailing '.'.
3. Eliminate directories with more than 2 '.'s.  While on
   Unix it is valid to create a directory '...', it is 
   not on Win32.  This is a very suspect directory, and
   therefore, as a security issue, I've opted to eliminate
   it.  This could be up for discussion.
   
That's it!  This eliminated the recursive calls, calls
to Win32 functions like GetFullPathName and FindFirstFile
that exist in sub_canonical_filename; in fact there is
no more sub_canonical_filename.  I believe this greatly 
simplifies that entire code path.

Now, let me rehash the output above, but include
the new routine output:

Path: [*.perl]
Old : [f:/work/canon/debug/*.perl]
ap_os_case_canonical_filename : [*.perl]
ap_os_canonical_filename      : [*.perl]

Path: [*.PERL]
Old : [f:/work/canon/debug/*.PERL]
ap_os_case_canonical_filename : [*.PERL]
ap_os_canonical_filename      : [*.perl]

Notice the new routine does not add a default
drive and path.  When parsing the <Files> directive,
the server uses ap_os_canonical_filename so it will
always get lower case.

Path: [f:\apache\htdocs\..\Welcome.html]
Old : [f:/apache/htdocs/welcome.html]
ap_os_case_canonical_filename : [f:/apache/htdocs/../Welcome.html]
ap_os_canonical_filename      : [f:/apache/htdocs/../welcome.html]

The '..' is preserved because it is valid.

Path: [//Apache/HTdocs]
Old : [//Apache/HTdocs\]
ap_os_case_canonical_filename : [//Apache/HTdocs]
ap_os_canonical_filename      : [//apache/htdocs]

Path: [//Grumble////hello//world]
Old : [//Grumble////hello\//world]
ap_os_case_canonical_filename : [//Grumble////hello//world]
ap_os_canonical_filename      : [//grumble////hello//world]

Since we are simpler, we don't get hung up on more complicated
cases.

Path: [hello.html]
Old : [f:/work/canon/debug/hello.html]
ap_os_case_canonical_filename : [hello.html]
ap_os_canonical_filename      : [hello.html]

Path: [f:]
Old : [f:/work/canon/debug/*]
ap_os_case_canonical_filename : [f:]
ap_os_canonical_filename      : [f:]

Path: [/]
Old : [f:/]
ap_os_case_canonical_filename : [/]
ap_os_canonical_filename      : [/]

Path: [*]
Old : [f:/work/canon/debug/*]
ap_os_case_canonical_filename : [*]
ap_os_canonical_filename      : [*]

No assumptions of drives and paths.

Path: [f:\APACHE\HTDOCS1\Welcome.html]
Old : [f:/apache/HTDOCS1/Welcome.html]
ap_os_case_canonical_filename : [f:/APACHE/HTDOCS1/Welcome.html]
ap_os_canonical_filename      : [f:/apache/htdocs1/welcome.html]

Path: [f:\APACHE\HTDOCS\Welcome.html]
Old : [f:/apache/htdocs/welcome.html]
ap_os_case_canonical_filename : [f:/APACHE/HTDOCS/Welcome.html]
ap_os_canonical_filename      : [f:/apache/htdocs/welcome.html]

We always get predictable output.


Path: [f:\apache...\htdocs..\NoDocs.....\World..\Hi.html]
Old : [f:/apache/htdocs/NoDocs/World/Hi.html]
ap_os_case_canonical_filename : [f:/apache/htdocs/NoDocs/World/Hi.html]
ap_os_canonical_filename      : [f:/apache/htdocs/nodocs/world/hi.html]

Path: [f:\apache.\htdocs.\NoDocs.\World.\Hi.html]
Old : [f:/apache/htdocs/NoDocs/World/Hi.html]
ap_os_case_canonical_filename : [f:/apache/htdocs/NoDocs/World/Hi.html]
ap_os_canonical_filename      : [f:/apache/htdocs/nodocs/world/hi.html]

Path: [f:\apache\htdocs\Welcome.html....]
Old : [f:/apache/htdocs/welcome.html]
ap_os_case_canonical_filename : [f:/apache/htdocs/Welcome.html]
ap_os_canonical_filename      : [f:/apache/htdocs/welcome.html]

Path: [f:\apache\...\htdocs\..\Welcome.html]
Old : [f:/apache/htdocs/welcome.html]
ap_os_case_canonical_filename : [f:/apache/htdocs/../Welcome.html]
ap_os_canonical_filename      : [f:/apache/htdocs/../welcome.html]

Just wanted to demonstrate the elimination of trailing '.'

Path: [.\]
Old : [ASSERT ERROR!!!!!!!]
ap_os_case_canonical_filename : [./]
ap_os_canonical_filename      : [./]

Path: [...\]
Old : [ASSERT ERROR!!!!!!!]
ap_os_case_canonical_filename : [/]
ap_os_canonical_filename      : [/]

Path: [...\.]
Old : [ASSERT ERROR!!!!!!!]
ap_os_case_canonical_filename : [/]
ap_os_canonical_filename      : [/]

Path: [...]
Old : [ASSERT ERROR!!!!!!!]
ap_os_case_canonical_filename : []
ap_os_canonical_filename      : []

Path: [..]
Old : [ASSERT ERROR!!!!!!!]
ap_os_case_canonical_filename : [..]
ap_os_canonical_filename      : [..]


I'll put the patches into a later
post since this one is getting so big.  However,
I did want to bring up the one change I 
made to http_request.c since it is the only
change that will affect the UNIX code path.  Here's the patch:

--- http_request.c.first	Tue Oct 06 19:12:24 1998
+++ http_request.c	Tue Oct 13 21:57:14 1998
@@ -359,16 +359,19 @@
         return OK;
     }
 
-    r->filename   = ap_os_canonical_filename(r->pool, r->filename);
-    test_filename = ap_pstrdup(r->pool, r->filename);
-
-    ap_no2slash(test_filename);
-    num_dirs = ap_count_dirs(test_filename);
+    r->filename   = ap_os_case_canonical_filename(r->pool, r->filename);
 
     res = get_path_info(r);
     if (res != OK) {
         return res;
     }
+
+    r->filename   = ap_os_canonical_filename(r->pool, r->filename);
+
+    test_filename = ap_pstrdup(r->pool, r->filename);
+
+    ap_no2slash(test_filename);
+    num_dirs = ap_count_dirs(test_filename);
 
     if ((res = check_safe_file(r))) {
         return res;

This is in directory_walk.  The existing code calls 
ap_os_canonical_filename then get_path_info.  PATH_INFO
should maintain the case that is passed in on the URI.
The only reason it works today is because 
ap_os_canonical_filename does not find the path, and
therefore does not change the case.  I've modified this
to be a little more explicit.  I call 
ap_os_case_canonical_filename which will maintain case, 
then call get_path_info.  After that, I call 
ap_os_canonical_filename which will put it in the known form
that server will use internally.  Remember, both these
calls are no-ops on Unix. 

I've tested the code on my NT box and ran simple GETs, 
CGIs, Alias directive tests, and Files directive tests, and
everything seems to be working fine.  Now I need for someone
else to look at all of this and see if there is a hole.

Ok, that's it.  I'll be happy to post my patches as soon
as someone wants, but I wanted to give you all the 
information I had so you could chew on it.

Thanks for taking time.
- - - - - - - - - - - - - - - - - -
Ken Parzygnat
email: kparz@raleigh.ibm.com 

Mime
View raw message