httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Darroch <chr...@pearsoncmg.com>
Subject walk caching to avoid extra authnz
Date Wed, 06 Dec 2006 01:06:16 GMT
Hi --

   The short version of this email is, please, can people review
this patch for server/request.c and see if it breaks anything?
There are versions for trunk and 2.2.x.  Thanks in advance!

http://people.apache.org/~chrisd/patches/walk_cache/


   I recently finished setting up mod_dav behind mod_authn_dbd, and
while we were testing I noticed that certain WebDAV clients
(*ahem, DAVExplorer*) send PROPFIND requests that cause mod_dav to
recurse down through the contents of the requested directory,
generating sub-requests for each file or directory it encounters.
Presumably such clients send a large DAV depth header value; I didn't
really probe the exact request contents.

   What surprised me was discovering that each sub-request (or,
equally, internal redirect) went through the authnz steps despite
the fact that I just had a single blanket authnz configuration for
the entire directory on which I'd enabled mod_dav.  This meant
that if a user happened to request a directory with, say, 1000 files,
each file triggered a mod_authn_dbd request to the database.
Clearly, this was going to drag performance into the ground if
released as-was, since we have lots of really large directories.

   It surprised me in part because in ap_process_request_internal()
there are comments and code like:

    /* Skip authn/authz if the parent or prior request passed the authn/authz,
     * and that configuration didn't change (this requires optimized _walk()
     * functions in map_to_storage that use the same merge results given
     * identical input.)  If the config changes, we must re-auth.
     */
    if (r->main && (r->main->per_dir_config == r->per_dir_config)) {
        r->user = r->main->user;
        r->ap_auth_type = r->main->ap_auth_type;
    }
    else if (r->prev && (r->prev->per_dir_config == r->per_dir_config))
{
        r->user = r->prev->user;
        r->ap_auth_type = r->prev->ap_auth_type;
    }

And in make_sub_request():

    /* Start a clean config from this subrequest's vhost.  Optimization in
     * Location/File/Dir walks from the parent request assure that if the
     * config blocks of the subrequest match the parent request, no merges
     * will actually occur (and generally a minimal number of merges are
     * required, even if the parent and subrequest aren't quite identical.)
     */

   But, as it turned out, the ap_*_walk() functions only set
r->per_dir_config to r->main->per_dir_config (or r->prev->per_dir_config)
if the URI or filename exactly matches.  The relevant bits look like:

    /* If we have an cache->cached location that matches r->uri,
     * and the vhost's list of locations hasn't changed, we can skip
     * rewalking the location_walk entries.
     */
    if (cache->cached
        && (cache->dir_conf_tested == sec_ent)
        && (strcmp(entry_uri, cache->cached) == 0)) {
        ...
        if (r->per_dir_config == cache->dir_conf_merged) {
            r->per_dir_config = cache->per_dir_result;
            return OK;
        }

   This is why all the sub-requests generated by mod_dav, where a
sub-request's r->uri might be "/foo/bar.html" while the main request's
r->uri was "/foo/", were each getting their own r->per_dir_config,
so that ap_process_request_internal() re-ran all the authnz checks
for each one.

   However, the code in the ap_*_walk() functions for these cases
looked like it might be very close to being able to recognize them
as not requiring their own privately merged r->per_dir_config.  As the
functions step through the config sections they detect any variation
from the sections which matched for the previous or parent request;
if none is found, now_merged is set to the last set of merged configs
from the cache and that's then merged onto r->per_dir_config.

   If they could also detect that r->per_dir_config was the same
as cache->dir_conf_merged -- that is, if the previous or parent
request started from the same config -- then maybe they could set
r->per_dir_config to cache->per_dir_result just like the case
where the URI or filenames matched exactly.

   That's what this patch does, in sum.  There's obviously one
big assumption here, that it's OK to have lots of sub-requests
and internal redirects pointing to the original request's
r->per_dir_config, even if their URIs don't match.  So far it seems
to work OK for me, but there may be problems I haven't perceived or
stumbled on!  So, testing and review would be much appreciated.


   The changes to the ap_*_walk() functions are pretty clear,
I think.  The big change is to prep_walk_cache().  The problem I
had there was that ap_process_request_internal() resets
r->per_dir_config to r->server->lookup_defaults after the
first call to ap_location_walk().  Here's a crummy diagram of
what seems to happen to the r->dir_conf_merged and r->per_dir_result
values in the cache as you pass through the initial request and
then a sub-request (S = r->server->lookup_defaults):

                           dir_conf_merged    per_dir_result
init: ap_location_walk:    S               -> A
      (reset)
      ap_directory_walk:   S               -> B
      ap_file_walk:        B               -> C
      ap_location_walk:    C               -> D   (overwrites cache)

sub:  ap_location_walk:    S               -> W
      (reset)
      ap_directory_walk:   S               -> X
      ap_file_walk:        X               -> Y
      ap_location_walk:    Y               -> Z   (overwrites cache)

   So the first call to ap_location_walk() for the sub-request
sees the C -> D values, while the second sees the S -> W values.
This wasn't going to work with my idea, because I needed each call
to an ap_*_walk() function in the sub-request to check its
r->per_dir_config against the r->dir_conf_merged from the corresponding
invocation of the same function in the initial request.  Thus, the patch
introduces a singly-linked list of walk_cache_t structures and a
count value so that prep_walk_cache() can look up the walk cache
for the corresponding invocation of each function.


   One other small thing this patch does is reverse the checks for
r->main and r->prev in a couple of places.  Reviewing how sub-requests
(r->main set) and internal redirects (r->prev set) are created in
internal_internal_redirect() and ap_set_sub_req_protocol(), it
looked to me as though r->prev is NULL for sub-requests, but r->main
is r->prev->main for internal redirects.  So, if one imagines a chain
of them, say:

init -> sub1 -> sub2 -> redir1 -> sub3

then to step backwards through the chain, you want to step back to
r->prev first (if it's non-NULL) because otherwise you'll step from
redir1 back to sub1, skipping sub2.  (While redir1->prev == sub2,
redir1->main == sub2->main == sub1.  I think.)  At any rate, this seems
to be how log_backtrace() works, so I figured it was probably worth
making request.c conform.  If it breaks anything, though, it's not
actually relevant to the walk cache changes and could be left as-is.

   So ... comments, criticisms?  What's the policy on changes like
this going into trunk, if I don't hear much from anyone?

   Thanks all,

Chris.

-- 
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B

Mime
View raw message