httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William A. Rowe, Jr." <>
Subject Optimizing dir_merge()
Date Thu, 16 Aug 2001 00:28:31 GMT
Here's my take on the dir_merge patch I offered up.  Please review for sanity.


Let's examine a very simple configuration, for mod_dir.

  static void *create_dir_config(apr_pool_t *p, char *dummy)
      dir_config_rec *new =
      (dir_config_rec *) apr_pcalloc(p, sizeof(dir_config_rec));

      new->index_names = NULL;
      return (void *) new;

In the server configuration phase, this function is called with the cmd->pool
context and the appropriate arguments.  The table member index_names is explicitly
set to NULL for clarity. (Since the dir_config_rec structure is apr_pcalloc'ed, 
this is not strictly necessary.)

  static const char *add_index(cmd_parms *cmd, void *dummy, const char *arg)
      dir_config_rec *d = dummy;

      if (!d->index_names) {
          d->index_names = apr_array_make(cmd->pool, 2, sizeof(char *));
      *(const char **)apr_array_push(d->index_names) = arg;
      return NULL;

Each time the AddIndex directive is used, the add_index function is passed the cmd_params
including the cmd->pool, and the argument allocated from that same pool.  When the arg
is pushed onto the index_names array, all references are consistently in the same pool.

Note that index_names is only created when the user actually adds the AddIndex directive in
their configuration.  This generally provides very small dir_config_rec allocations.

For .htaccess overrides, the local r->pool is passed for the p argument to create_dir_config
and the add_index cmd->pool.  The .htaccess file is parsed and the config_rec structures
created before Apache tries to mix the httpd.conf dir_config_rec structures with the .htaccess

dir_config_rec structures.


To be completely clear, dir_merge is a misnomer.  This hook is called for every
<Directory >, <Location > and <File > that corresponds to a given request.

  static void *merge_dir_configs(apr_pool_t *p, void *basev, void *addv)
      dir_config_rec *new = (dir_config_rec *) apr_pcalloc(p, sizeof(dir_config_rec));
      dir_config_rec *base = (dir_config_rec *) basev;
      dir_config_rec *add = (dir_config_rec *) addv;

      new->index_names = add->index_names ? add->index_names : base->index_names;
      return new;

Let's imagine for a moment that index_names were actually cumulative.

  static void *merge_dir_configs(apr_pool_t *p, void *basev, void *addv)
      dir_config_rec *new = (dir_config_rec *) apr_pcalloc(p, sizeof(dir_config_rec));
      dir_config_rec *base = (dir_config_rec *) basev;
      dir_config_rec *add = (dir_config_rec *) addv;

      if (add->index_names && base->index_names) {
          int nelts = base->index_names->nelts + add->index_names->nelts;
          new->index_names = apr_array_make(cmd->pool, nelts, sizeof(char *));
          memcpy(new->index_names->elts, base->index_names->elts, 
                 base->index_names->nelts * sizeof(char*)));
          memcpy(new->index_names->elts + base->index_names->nelts,
                 add->index_names->nelts * sizeof(char*))
          new->nelts = nelts;
          new->index_names = add->index_names ? add->index_names : base->index_names;
      return new;

Here, we must make a new array to combine the base and add index_names.  If we always 
inserted the add->elts directly into the base->elts, we would be changing the initial

configuration!  You will discover this most often when the mistaken change to that base 
record points an .htaccess configuration, which is destroyed once the request is finished.

This points to a very useful debugging technique to catch errors.  Always try adding your
module's directives into an .htaccess file.  That storage is very temporary, so if you
maul the base configuration, the server then segfaults on subsequent requests, if you
remove the .htaccess file and try the request again.


Some modules are so complex that creating copies on each Directory, Location and File
merge is a huge performance penalty.  In order to build upon the base configuration, 
two rules must be respected;

  . If the pool argument passed in merge_dir_configs doesn't match the base config,
    it's out of scope!  You must copy when the pool argument differs from the base!

  . Likewise, if you are changing parts of the add config, and the add config's pool 
    doesn't match the pool argument, that structure must be copied!

The pool argument corresponds to the request pool, or the subrequest pool.  The base
argument is either the process pool (for the initial configuration) or the request or
subrequest pool (from an .htaccess file, or once it has been copied.)

In the future, the pool argument may also correspond to a cache pool, to pre-merge
selected directory configurations.  If the base corresponds to a cached config, and
the merge is requested for the request, the config will be copied if you follow the
rules above.

Subrequests (such as mod_autoindex, mod_negotation and mod_includes create) must also
avoid touching the main request's config.  This is safely done by respecting testing
the pool and copying when the pool argument changes.  All .htaccess and main request's
configs are left alone by respecting the two rules above.

So remember, if the pool fits, you may munge or expand the config record.  If the pool
changes, you must copy.

Subrequest Safety

If there are any changes to the subrequest's dir config, indicated by the pool changing 
from r->pool to sub_req->pool, then the server expects a new dir config structure. 
this structure is _not_ changed, many of the common security and other hooks are bypassed,
elimiating redundant access and other checks.

Creating a new dir config structure every time the merge_dir_configs' pool argument has
changed avoids this potential hole.

Caching Considerations

The key point within Apache is; a given per-dir config cannot be trusted after a 
subsequent merge_dir_configs callback in the same pool.  Any caching logic must create
a seperate pool to build the cached dir config.  If that cache will build nested dir
configs, each sub-dir must be given a sub-pool.  

As nested dir configs are unwound, the farthest decendants must be pruned first, since 
they potentially reference one or all parent pools.  Pruned cache entries may be
set aside when stale, but they cannot be destroyed until all requests and sub-dir caches 
that  reference that cached config are gone.  This requires reference counting, and 
thread locking, and cannot be effectively implemented across processes today.

View raw message