Received: (from majordom@localhost) by hyperreal.org (8.8.5/8.8.5) id CAA16651; Fri, 1 Aug 1997 02:36:52 -0700 (PDT) Received: from twinlark.arctic.org (twinlark.arctic.org [204.62.130.91]) by hyperreal.org (8.8.5/8.8.5) with SMTP id CAA16532 for ; Fri, 1 Aug 1997 02:36:39 -0700 (PDT) Received: (qmail 5916 invoked from network); 1 Aug 1997 09:35:18 -0000 Received: from benchlark.arctic.org (206.221.201.235) by twinlark.arctic.org with SMTP; 1 Aug 1997 09:35:18 -0000 Date: Fri, 1 Aug 1997 02:36:30 -0700 (PDT) From: Dean Gaudet To: new-httpd@apache.org Subject: [PATCH] directory_walk optimization (take 1) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: new-httpd-owner@apache.org Precedence: bulk Reply-To: new-httpd@apache.org This is not the final product. It works fine, could use some more testing. But more importantly, there are some issues we need to resolve. First of all the main goal in brief: directory_walk currently is O(N*M) where N is the number of sections, and M is the number of components in the filename for the request. It's possible to make this O(N+M) by imposing a few restrictions: - * wildcarding respects slash, just like it does in sh - regex matching is performed last (or first) after (before) all others Issues: 1. I didn't want to reimplement fnmatch, which implements sh-style wildcarding. This patch includes the GNU fnmatch.[ch], which of course doesn't help with our license (even though it's under GLPL). So it needs to be replaced... help? 2. Right now this does regex sections last, which is what I said I'd do. But in doing so I've realised that there is a security bug in 1.2... it's similar to a bug I fixed during the 1.2 betas: If there is a regex-style section that matches files, and a sub_req_lookup_file is performed on a relative file that matches it, the section won't be merged. mod_autoindex, and mod_include use sub_req_lookup_file. This comes into effect when they lookup a file "foobar" which is in the same directory as the parent request. There is an optimization in sub_req_lookup_file which completely skips the directory_walk() (and hence the section that matches). This optimization is desirable. In the 1.2 betas I fixed a similar problem involving sections that weren't matched... not remembering that regex-s could also match. My proposal to fix this problem in 1.3 is to completely remove regex s. Why? Well, you get the exact same functionality by using regex . I'm convinced there's nothing that we lose by removing regex . This simplifies the new directory_walk code a bit, and it allows us to keep the sub_req_lookup_file optimization. For 1.2 I just propose we document the existance of the bug, and suggest people switch immediately to using regex , and indicate that regex will be removed in 1.3. 3. This patch doesn't yet deal with PR#717... for which we've got a little reordering to do with how is merged from the main server into virtual hosts. Right now we do the opposite of what should be done for . (And probably but I'm getting sleepy and can't think straight.) Dean Index: Makefile.tmpl =================================================================== RCS file: /export/home/cvs/apache/src/Makefile.tmpl,v retrieving revision 1.53 diff -u -r1.53 Makefile.tmpl --- Makefile.tmpl 1997/07/31 20:56:24 1.53 +++ Makefile.tmpl 1997/08/01 09:15:28 @@ -11,7 +11,7 @@ OBJS= alloc.o http_main.o http_core.o http_config.o http_request.o \ http_log.o http_protocol.o rfc1413.o util.o util_script.o modules.o buff.o\ md5c.o util_md5.o explain.o http_bprintf.o util_date.o util_snprintf.o\ - $(MODULES) + fnmatch.o $(MODULES) .c.o: $(CC) -c $(INCLUDES) $(CFLAGS) $(SPACER) $< Index: fnmatch.c =================================================================== RCS file: fnmatch.c diff -N fnmatch.c --- /dev/null Fri Aug 1 02:15:00 1997 +++ fnmatch.c Fri Aug 1 02:15:29 1997 @@ -0,0 +1,266 @@ +/* Copyright (C) 1991 Free Software Foundation, Inc. +This file is part of the GNU C Library. + +The GNU C Library is free software; you can redistribute it and/or +modify it under the terms of the GNU Library General Public License as +published by the Free Software Foundation; either version 2 of the +License, or (at your option) any later version. + +The GNU C Library is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +Library General Public License for more details. + +You should have received a copy of the GNU Library General Public +License along with the GNU C Library; see the file COPYING.LIB. If +not, write to the Free Software Foundation, Inc., 675 Mass Ave, +Cambridge, MA 02139, USA. */ + +#include +#include "fnmatch.h" + +#if !defined (__GNU_LIBRARY__) && !defined (STDC_HEADERS) +# if !defined (errno) +extern int errno; +# endif /* !errno */ +#endif + +/* Match STRING against the filename pattern PATTERN, returning zero if + it matches, FNM_NOMATCH if not. */ +int +fnmatch (const char *pattern, const char *string, int flags) +{ + register const char *p = pattern, *n = string; + register char c; + + if ((flags & ~__FNM_FLAGS) != 0) + { + errno = EINVAL; + return (-1); + } + + while ((c = *p++) != '\0') + { + switch (c) + { + case '?': + if (*n == '\0') + return (FNM_NOMATCH); + else if ((flags & FNM_PATHNAME) && *n == '/') + /* If we are matching a pathname, `?' can never match a `/'. */ + return (FNM_NOMATCH); + else if ((flags & FNM_PERIOD) && *n == '.' && + (n == string || ((flags & FNM_PATHNAME) && n[-1] == '/'))) + /* `?' cannot match a `.' if it is the first character of the + string or if it is the first character following a slash and + we are matching a pathname. */ + return (FNM_NOMATCH); + break; + + case '\\': + if (!(flags & FNM_NOESCAPE)) + { + c = *p++; + if (c == '\0') + return (FNM_NOMATCH); + } + if (*n != c) + return (FNM_NOMATCH); + break; + + case '*': + if ((flags & FNM_PERIOD) && *n == '.' && + (n == string || ((flags & FNM_PATHNAME) && n[-1] == '/'))) + /* `*' cannot match a `.' if it is the first character of the + string or if it is the first character following a slash and + we are matching a pathname. */ + return (FNM_NOMATCH); + + /* Collapse multiple consecutive, `*' and `?', but make sure that + one character of the string is consumed for each `?'. */ + for (c = *p++; c == '?' || c == '*'; c = *p++) + { + if ((flags & FNM_PATHNAME) && *n == '/') + /* A slash does not match a wildcard under FNM_PATHNAME. */ + return (FNM_NOMATCH); + else if (c == '?') + { + if (*n == '\0') + return (FNM_NOMATCH); + /* One character of the string is consumed in matching + this ? wildcard, so *??? won't match if there are + fewer than three characters. */ + n++; + } + } + + if (c == '\0') + return (0); + + /* General case, use recursion. */ + { + char c1 = (!(flags & FNM_NOESCAPE) && c == '\\') ? *p : c; + for (--p; *n != '\0'; ++n) + /* Only call fnmatch if the first character indicates a + possible match. */ + if ((c == '[' || *n == c1) && + fnmatch (p, n, flags & ~FNM_PERIOD) == 0) + return (0); + return (FNM_NOMATCH); + } + + case '[': + { + /* Nonzero if the sense of the character class is inverted. */ + register int not; + + if (*n == '\0') + return (FNM_NOMATCH); + + /* A character class cannot match a `.' if it is the first + character of the string or if it is the first character + following a slash and we are matching a pathname. */ + if ((flags & FNM_PERIOD) && *n == '.' && + (n == string || ((flags & FNM_PATHNAME) && n[-1] == '/'))) + return (FNM_NOMATCH); + + /* POSIX.2 2.8.3.1.2 says: `An expression containing a `[' that + is not preceded by a backslash and is not part of a bracket + expression produces undefined results.' This implementation + treats the `[' as just a character to be matched if there is + not a closing `]'. This code will have to be changed when + POSIX.2 character classes are implemented. */ + { + register const char *np; + + for (np = p; np && *np && *np != ']'; np++) + ; + + if (np && !*np) + { + if (*n != '[') + return (FNM_NOMATCH); + break; + } + } + + not = (*p == '!' || *p == '^'); + if (not) + ++p; + + c = *p++; + for (;;) + { + register char cstart, cend; + + /* Initialize cstart and cend in case `-' is the last + character of the pattern. */ + cstart = cend = c; + + if (!(flags & FNM_NOESCAPE) && c == '\\') + { + if (*p == '\0') + return FNM_NOMATCH; + cstart = cend = *p++; + } + + if (c == '\0') + /* [ (unterminated) loses. */ + return (FNM_NOMATCH); + + c = *p++; + + if ((flags & FNM_PATHNAME) && c == '/') + /* [/] can never match. */ + return (FNM_NOMATCH); + + /* This introduces a range, unless the `-' is the last + character of the class. Find the end of the range + and move past it. */ + if (c == '-' && *p != ']') + { + cend = *p++; + if (!(flags & FNM_NOESCAPE) && cend == '\\') + cend = *p++; + if (cend == '\0') + return (FNM_NOMATCH); + + c = *p++; + } + + if (*n >= cstart && *n <= cend) + goto matched; + + if (c == ']') + break; + } + if (!not) + return (FNM_NOMATCH); + break; + + matched: + /* Skip the rest of the [...] that already matched. */ + while (c != ']') + { + if (c == '\0') + /* [... (unterminated) loses. */ + return (FNM_NOMATCH); + + c = *p++; + if (!(flags & FNM_NOESCAPE) && c == '\\') + { + if (*p == '\0') + return FNM_NOMATCH; + /* XXX 1003.2d11 is unclear if this is right. */ + ++p; + } + } + if (not) + return (FNM_NOMATCH); + } + break; + + default: + if (c != *n) + return (FNM_NOMATCH); + } + + ++n; + } + + if (*n == '\0') + return (0); + + return (FNM_NOMATCH); +} + +/* Return nonzero if PATTERN has any special globbing chars in it. */ +int +is_fnmatch (const char *pattern) +{ + register const char *p = pattern; + register char c; + int open = 0; + + while ((c = *p++) != '\0') + switch (c) + { + case '?': + case '*': + return (1); + + case '[': /* Only accept an open brace if there is a close */ + open++; /* brace to match it. Bracket expressions must be */ + continue; /* complete, according to Posix.2 */ + case ']': + if (open) + return (1); + continue; + + case '\\': + if (*p++ == '\0') + return (0); + } + + return (0); +} Index: fnmatch.h =================================================================== RCS file: fnmatch.h diff -N fnmatch.h --- /dev/null Fri Aug 1 02:15:00 1997 +++ fnmatch.h Fri Aug 1 02:15:29 1997 @@ -0,0 +1,38 @@ +/* Copyright (C) 1991 Free Software Foundation, Inc. +This file is part of the GNU C Library. + +The GNU C Library is free software; you can redistribute it and/or +modify it under the terms of the GNU Library General Public License as +published by the Free Software Foundation; either version 2 of the +License, or (at your option) any later version. + +The GNU C Library is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +Library General Public License for more details. + +You should have received a copy of the GNU Library General Public +License along with the GNU C Library; see the file COPYING.LIB. If +not, write to the Free Software Foundation, Inc., 675 Mass Ave, +Cambridge, MA 02139, USA. */ + +#ifndef _FNMATCH_H + +#define _FNMATCH_H 1 + +/* Bits set in the FLAGS argument to `fnmatch'. */ +#define FNM_PATHNAME (1 << 0)/* No wildcard can ever match `/'. */ +#define FNM_NOESCAPE (1 << 1)/* Backslashes don't quote special chars. */ +#define FNM_PERIOD (1 << 2)/* Leading `.' is matched only explicitly. */ +#define __FNM_FLAGS (FNM_PATHNAME|FNM_NOESCAPE|FNM_PERIOD) + +/* Value returned by `fnmatch' if STRING does not match PATTERN. */ +#define FNM_NOMATCH 1 + +/* Match STRING against the filename pattern PATTERN, + returning zero if it matches, FNM_NOMATCH if not. */ +extern int fnmatch (const char *pattern, const char *string, int flags); + +extern int is_fnmatch (const char *); + +#endif /* fnmatch.h */ Index: http_config.c =================================================================== RCS file: /export/home/cvs/apache/src/http_config.c,v retrieving revision 1.68 diff -u -r1.68 http_config.c --- http_config.c 1997/07/28 18:22:42 1.68 +++ http_config.c 1997/08/01 09:15:32 @@ -1096,6 +1096,9 @@ { server_rec *virt; + /* XXX: this is really something that should be dealt with by a + * post-config api phase */ + core_reorder_directories (p, main_server); for (virt = main_server->next; virt; virt = virt->next) { merge_server_configs (p, main_server->module_config, virt->module_config); @@ -1127,6 +1130,8 @@ if (virt->send_buffer_size == 0) virt->send_buffer_size = main_server->send_buffer_size; + + core_reorder_directories (p, virt); } } Index: http_config.h =================================================================== RCS file: /export/home/cvs/apache/src/http_config.h,v retrieving revision 1.41 diff -u -r1.41 http_config.h --- http_config.h 1997/07/28 18:22:43 1.41 +++ http_config.h 1997/08/01 09:15:33 @@ -299,6 +299,8 @@ void *create_per_dir_config (pool *p); void *merge_per_dir_configs (pool *p, void *base, void *new); +void core_reorder_directories (pool *, server_rec *); + /* For http_core.c... ( command and virtual hosts) */ int parse_htaccess(void **result, request_rec *r, int override, Index: http_core.c =================================================================== RCS file: /export/home/cvs/apache/src/http_core.c,v retrieving revision 1.103 diff -u -r1.103 http_core.c --- http_core.c 1997/07/31 07:51:34 1.103 +++ http_core.c 1997/08/01 09:15:37 @@ -62,6 +62,7 @@ #include "rfc1413.h" #include "util_md5.h" #include "scoreboard.h" +#include "fnmatch.h" /* Server core module... This module provides support for really basic * server operations, including options and commands which control the @@ -84,8 +85,8 @@ if (!dir || dir[strlen(dir) - 1] == '/') conf->d = dir; else if (strncmp(dir,"proxy:",6)==0) conf->d = pstrdup (a, dir); else conf->d = pstrcat (a, dir, "/", NULL); - conf->d_is_matchexp = conf->d ? (is_matchexp( conf->d ) != 0) : 0; - + conf->d_is_fnmatch = conf->d ? (is_fnmatch (conf->d) != 0) : 0; + conf->d_components = conf->d ? count_dirs (conf->d) : 0; conf->opts = dir ? OPT_UNSET : OPT_ALL; conf->opts_add = conf->opts_remove = OPT_NONE; @@ -129,7 +130,8 @@ } conf->d = new->d; - conf->d_is_matchexp = new->d_is_matchexp; + conf->d_is_fnmatch = new->d_is_fnmatch; + conf->d_components = new->d_components; conf->r = new->r; if (new->opts != OPT_UNSET) conf->opts = new->opts; @@ -235,6 +237,73 @@ *new_space = url_config; } +/* This routine reorders the directory sections such that the 1-component + * sections come first, then the 2-component, and so on, finally followed + * by the regex sections. All movements are in-order to preserve the + * ordering of the sections from the config files. See directory_walk(). + */ +void core_reorder_directories (pool *p, server_rec *s) +{ + core_server_config *sconf; + array_header *old_sec; + array_header *new_sec; + int nelts; + unsigned n_components; + unsigned next_n_components; + core_dir_config *entry_core; + int still_more_non_regex_to_go; + int i; + void **elts; + + sconf = get_module_config (s->module_config, &core_module); + old_sec = sconf->sec; + nelts = old_sec->nelts; + elts = (void **)old_sec->elts; + new_sec = make_array (p, nelts, sizeof(void *)); + + /* First collect all the 1 component names, then the 2 componennt names, + * and so on. We use next_n_components to know what to look for the + * next time around... to deal with weird configs with many many many + * in at least one . + */ + n_components = 1; + do { + /* guess there's none left other than ones with exactly n_components */ + still_more_non_regex_to_go = 0; + /* guess that what's left has infinite components */ + next_n_components = ~0u; + for (i = 0; i < nelts; ++i) { + if (elts[i] == NULL) continue; + entry_core = (core_dir_config *)get_module_config (elts[i], + &core_module); + if (entry_core->r) continue; + if (entry_core->d_components != n_components) { + /* oops, the guess was wrong */ + still_more_non_regex_to_go = 1; + if (entry_core->d_components < next_n_components) { + next_n_components = entry_core->d_components; + } + continue; + } + *(void **)push_array (new_sec) = elts[i]; + elts[i] = NULL; + } + n_components = next_n_components; + } while (still_more_non_regex_to_go); + + /* anything left is a regex */ + for (i = 0; i < nelts; ++i) { + if (elts[i] == NULL) continue; + *(void **)push_array (new_sec) = elts[i]; + } + + /* XXX: in theory we could have allocated new_sec from the ptemp + * pool, and then memcpy'd it over top of old_sec ... oh well, + * we're wasting some ram here. + */ + sconf->sec = new_sec; +} + /***************************************************************** * * There are some elements of the core config structures in which @@ -733,7 +802,7 @@ conf = (core_dir_config *)get_module_config(new_url_conf, &core_module); conf->d = pstrdup(cmd->pool, cmd->path); /* No mangling, please */ - conf->d_is_matchexp = is_matchexp( conf->d ) != 0; + conf->d_is_fnmatch = is_fnmatch( conf->d ) != 0; conf->r = r; add_per_url_conf (cmd->server, new_url_conf); @@ -789,7 +858,7 @@ conf = (core_dir_config *)get_module_config(new_file_conf, &core_module); conf->d = pstrdup(cmd->pool, cmd->path); - conf->d_is_matchexp = is_matchexp( conf->d ) != 0; + conf->d_is_fnmatch = is_fnmatch( conf->d ) != 0; conf->r = r; add_file_conf (c, new_file_conf); Index: http_core.h =================================================================== RCS file: /export/home/cvs/apache/src/http_core.h,v retrieving revision 1.25 diff -u -r1.25 http_core.h --- http_core.h 1997/07/30 18:41:51 1.25 +++ http_core.h 1997/08/01 09:15:38 @@ -129,8 +129,10 @@ typedef unsigned char overrides_t; typedef struct { - /* path of the directory/regex/etc. see also d_is_matchexp below */ + /* path of the directory/regex/etc. see also d_is_fnmatch below */ char *d; + /* the number of slashes in d */ + unsigned d_components; allow_options_t opts; allow_options_t opts_add; @@ -170,11 +172,11 @@ int content_md5 : 2; /* calculate Content-MD5? */ - /* since is_matchexp(conf->d) was being called so frequently in + /* since is_fnmatch(conf->d) was being called so frequently in * directory_walk() and its relatives, this field was created and * is set to the result of that call. */ - int d_is_matchexp : 1; + int d_is_fnmatch : 1; /* System Resource Control */ #ifdef RLIMIT_CPU Index: http_request.c =================================================================== RCS file: /export/home/cvs/apache/src/http_request.c,v retrieving revision 1.67 diff -u -r1.67 http_request.c --- http_request.c 1997/08/01 08:01:21 1.67 +++ http_request.c 1997/08/01 09:15:40 @@ -69,6 +69,7 @@ #include "http_log.h" #include "http_main.h" #include "scoreboard.h" +#include "fnmatch.h" /***************************************************************** * @@ -254,7 +255,7 @@ char *test_dirname, *test_htaccess; int num_dirs, res; - int i, test_filename_len; + int i, j, test_filename_len; /* Are we dealing with a file? If not, we can (hopefuly) safely assume * we have a handler that doesn't require one, but for safety's sake, @@ -305,8 +306,8 @@ if (!regexec(entry_core->r, test_filename, 0, NULL, 0)) this_conf = entry_config; } - else if (entry_core->d_is_matchexp) { - if (!strcmp_match(test_filename, entry_dir)) + else if (entry_core->d_is_fnmatch) { + if (!fnmatch (entry_dir, test_filename, FNM_PATHNAME)) this_conf = entry_config; } else if (!strncmp (test_filename, entry_dir, strlen(entry_dir))) @@ -345,13 +346,14 @@ */ test_dirname = palloc (r->pool, test_filename_len+1); test_htaccess = NULL; + /* j keeps track of which section we're on, see core_reorder_directories */ + j = 0; for (i = 1; i <= num_dirs; ++i) { core_dir_config *core_dir = (core_dir_config *)get_module_config(per_dir_defaults, &core_module); int overrides_here; void *this_conf, *htaccess_conf = NULL; char *test_dirname_tail; - int j; test_dirname_tail = make_dirstr_prefix (test_dirname, test_filename, i); @@ -369,34 +371,24 @@ * access.conf. */ - for (j = 0; j < num_sec; ++j) { + for (; j < num_sec; ++j) { void *entry_config = sec[j]; core_dir_config *entry_core; char *entry_dir; - if (!entry_config) continue; - entry_core = (core_dir_config *)get_module_config(entry_config, &core_module); entry_dir = entry_core->d; + + if (entry_core->r || entry_core->d_components > i) break; this_conf = NULL; - if (entry_core->r) { - if (!regexec(entry_core->r, test_dirname, 0, NULL, - (j == num_sec) ? 0 : REG_NOTEOL)) { - /* Don't try this wildcard again --- if it ends in '*' - * it'll match again, and subdirectories won't be able to - * override it... - */ - sec[j] = NULL; + if (entry_core->d_is_fnmatch) { + if (!fnmatch(entry_dir, test_dirname, FNM_PATHNAME)) { + sec[j] = NULL; this_conf = entry_config; } } - else if (entry_core->d_is_matchexp && - !strcmp_match(test_dirname, entry_dir)) { - sec[j] = NULL; - this_conf = entry_config; - } else if (!strcmp (test_dirname, entry_dir)) this_conf = entry_config; @@ -427,6 +419,22 @@ } + /* now match the regex sections */ + for ( ; j < num_sec; ++j) { + void *entry_config = sec[j]; + core_dir_config *entry_core; + + entry_core = + (core_dir_config *)get_module_config(entry_config, &core_module); + if (entry_core->r) { + if (!regexec(entry_core->r, test_dirname, 0, NULL, REG_NOTEOL)) { + per_dir_defaults = + merge_per_dir_configs (r->pool, per_dir_defaults, + entry_config); + } + } + } + r->per_dir_config = per_dir_defaults; /* Symlink permissions are determined by the parent. If the request is for @@ -494,9 +502,10 @@ if (!regexec(entry_core->r, test_location, 0, NULL, 0)) this_conf = entry_config; } - else if( entry_core->d_is_matchexp ) { - if (!strcmp_match(test_location, entry_url)) + else if( entry_core->d_is_fnmatch ) { + if (!fnmatch (entry_url, test_location, FNM_PATHNAME)) { this_conf = entry_config; + } } else if (!strncmp (test_location, entry_url, len) && (entry_url[len - 1] == '/' || @@ -556,9 +565,10 @@ if (!regexec(entry_core->r, test_file, 0, NULL, 0)) this_conf = entry_config; } - else if ( entry_core->d_is_matchexp ) { - if (!strcmp_match(test_file, entry_file)) + else if ( entry_core->d_is_fnmatch ) { + if (!fnmatch(entry_file, test_file, FNM_PATHNAME)) { this_conf = entry_config; + } } else if (!strncmp (test_file, entry_file, len) && (entry_file[len - 1] == '/' ||