httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruediger Pluem <rpl...@apache.org>
Subject Re: cache - cleaning up mod_memcache and making other caches their live easier
Date Sun, 10 Feb 2008 21:45:16 GMT


On 02/10/2008 05:39 PM, Dirk-Willem van Gulik wrote:
> 

> 
> See below for some aloud thinking -- note that this is not yet well 
> thought out (all I did was try to minimize the overlap between 
> disk_cache, memcached, distcache and some commercial thing -- and tried 
> to move as much RFC2616 knowledge into mod_cache.c*).
> 
> -    internal ap_cache_cacheable_hdr to hold all the RFC 2616
>         non cachable headers info as before.
> 
> -    ap_cache_cacheable_hdrs_out introduced as per earlier discussion
>     which knows about error headers and mandatory content setting.
> 
> -    ap_cache_cacheable_hdrs_in introduced - which has an optional
>     argument to further curtail the headers returned by just those
>     in the 'Vary'.
> 
> -    ap_normalize_vary_to_array() introduced. Which can help
>     caching storage modules to create a key across the Vary
>     relevant fields, if any.
> 
>     => given that they all have to create this -- wondering if
>     we should make this part of the cache_info struct already.

Maybe not a bad idea.

> 
> -    Likewise a ap_generate_vary_key() which understands
>     vary if/when needed.
> 
> Below more or less cleans mod_disk_cache completly of having to 
> understand http (or any protocol aspect) - but for 
> ap_generate_vary_key() -- but it is not yet HTTP neutral (i.e. a IMAP 
> protocol module would have to overwrite things it has not yet XS to).
> 
> Thoughts,
> 
> Dw.
> 

Some comments:

1. Some style nitpicks (especially indenting, sometimes typos in the comments) that
    make the patch sometimes hard to read.
2. Don't forget to add a minor bump once you commit.
3. While it may be a worthwhile goal to improve the error handling of mod_disk_cache
    I would love to see this addressed in a separate patch as these pieces of code distract
    while reading the patch and are not strictly connected to your goal of abstracting
    logic out of mod_disk_cache.
4. Looks good in general, some more questions inline

> 
> *: not yet nearly enough - i.e. lacking the whole case (in)sensitive, 
> date, etc stuff.
> 
> Index: cache_util.c
> ===================================================================
> --- cache_util.c    (revision 620288)
> +++ cache_util.c    (working copy)
> @@ -571,10 +573,10 @@
>      return apr_pstrdup(p, hashfile);
>  }
> 
> -/* Create a new table consisting of those elements from an input
> +/* Create a new table consisting of those elements from an
>   * headers table that are allowed to be stored in a cache.
>   */
> -CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_hdrs_out(apr_pool_t *pool,
> +CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_hdrs(apr_pool_t *pool,
>                                                          apr_table_t *t,
>                                                          server_rec *s)
>  {
> @@ -582,12 +584,17 @@
>      char **header;
>      int i;
> 
> +    if (t == NULL) {
> +    return NULL;
> +    };
> +
>      /* Make a copy of the headers, and remove from
>       * the copy any hop-by-hop headers, as defined in Section
>       * 13.5.1 of RFC 2616
>       */
>      apr_table_t *headers_out;
>      headers_out = apr_table_copy(pool, t);
> +
>      apr_table_unset(headers_out, "Connection");
>      apr_table_unset(headers_out, "Keep-Alive");
>      apr_table_unset(headers_out, "Proxy-Authenticate");
> @@ -599,6 +606,7 @@
> 
>      conf = (cache_server_conf *)ap_get_module_config(s->module_config,
>                                                       &cache_module);
> +
>      /* Remove the user defined headers set with CacheIgnoreHeaders.
>       * This may break RFC 2616 compliance on behalf of the administrator.
>       */
> @@ -608,3 +616,170 @@
>      }
>      return headers_out;
>  }
> +
> +/* Create a new table consisting of those elements from an input
> + * headers table that are allowed to be stored in a cache. Optionally
> + * filtered to just the fields passed in the vary_filter array.

Why this? Do you want to save space by not storing the input headers or
only storing those relevant to varying?
 From a first glance this looks like to be a good idea, but it may be better
separated to another patch to ease reading. This could possibly used later
to ease the logic in cache_select in cache_storage.c, as today its vary logic
is unneeded IMHO in the mod_disk_cache case. IMHO I cannot open the mod_disk_cache
entity if the cached resource had vary headers and the cached headers do not
match the ones from the incoming request. I would not have a hdrs file in this
case since the path to it is based on the hashed values of the input headers
that vary.

> + *
> + * @return a table with the valid input headers or NULL if none are 
> relevant.
> + */
> +CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_hdrs_in(request_rec * r, 
> apr_array_header_t * vary_filter)
> +{
> +    apr_table_t *headers_in, * vary_filtered;
> +        int i;
> +
> +    if (r->headers_in == NULL)
> +        return NULL;
> +
> +        headers_in = ap_cache_cacheable_hdrs(r->pool, r->headers_in, 
> r->server);
> +   
> +    if (!vary_filter)
> +        return apr_is_empty_table(headers_in) ? NULL: headers_in;
> +
> +        /* The vary array is most propably the smallest of the two - su
> +     * only copy just those fields across.
> +     */
> +         vary_filtered = apr_table_make(r->pool, vary_filter->nelts);
> +        for (i = 0; i < vary_filter->nelts; i++) {
> +                const char *key = ((const char**)vary_filter->elts)[i];
> +                const char * val = apr_table_get(headers_in, key); /* 
> XXX: Not case insensitive */
> +                if (val)
> +                        apr_table_add(vary_filtered, key, val);
> +        };
> +
> +    return apr_is_empty_table(vary_filtered) ? NULL : vary_filtered;
> +}
> +
> +/* Create a new table consisting of those elements from an output
> + * headers table that are allowed to be stored in a cache;
> + * ensure there is a content type and capture any errors.
> + */
> +CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_hdrs_out(request_rec * r)
> +{
> +    apr_table_t *headers_out;
> +
> +    headers_out = ap_cache_cacheable_hdrs(r->pool, r->headers_out,
> +                                                  r->server);
> +
> +        if (!apr_table_get(headers_out, "Content-Type")
> +            && r->content_type) {
> +            apr_table_setn(headers_out, "Content-Type",
> +                           ap_make_content_type(r, r->content_type));
> +        }
> +
> +        headers_out = apr_table_overlay(r->pool, headers_out,
> +                                        r->err_headers_out);
> +
> +    return headers_out;
> +}
> +
> +static int array_alphasort(const void *fn1, const void *fn2)
> +{
> +    return strcmp(*(char**)fn1, *(char**)fn2);
> +}
> +
> +static void vary_tokens_to_array(apr_pool_t *p, const char *data,
> +                            apr_array_header_t *arr)
> +{
> +    char *token;
> +
> +    while ((token = ap_get_list_item(p, &data)) != NULL) {
> +        *((const char **) apr_array_push(arr)) = token;
> +    }
> +
> +    qsort((void *) arr->elts, arr->nelts,
> +         sizeof(char *), array_alphasort);
> +}
> +
> +/* Given a header table; extract any Vary present and return
> + * the values in a predictable way (i.e. Sort it so that
> + * "Vary: A, B" and "Vary: B, A" are stored the same; and take
> + * care of any case sensitive issues. Onb exit the array will
> + * either be NULL (no VARYance) or a sorted array.
> + *
> + * @return vary-array or NULL, if no variance
> + */
> +CACHE_DECLARE(apr_array_header_t*) 
> ap_normalize_vary_to_array(apr_pool_t *p, apr_table_t * headers)
> +{
> +    apr_array_header_t* varray = NULL;
> +    const char *vary;
> +   
> +    if ((headers) && ((vary= apr_table_get(headers, "Vary")) != NULL))
> +    {
> +        varray = apr_array_make(p, 6, sizeof(char*));
> +            vary_tokens_to_array(p, vary, varray);
> +    }
> +   
> +    return varray;
> +}
> +
> +/* Generate a new key based on the normal key (URI) and normalized 
> values from
> + * the Vary array. The vary array is assumed to be in a stable order 
> and format.
> + */
> +CACHE_DECLARE(const char*) ap_generate_vary_key(apr_pool_t *p, 
> apr_table_t * input_headers,
> +                             apr_array_header_t *varray, const char 
> *oldkey)
> +{
> +    struct iovec *iov;
> +    int i, k;
> +    int nvec;
> +    const char *header;
> +    const char **elts;
> +
> +    if (varray == NULL);
> +    return oldkey;
> +
> +    nvec = (varray->nelts * 2) + 1;
> +    iov = apr_palloc(p, sizeof(struct iovec) * nvec);
> +    elts = (const char **) varray->elts;
> +
> +    /* TODO:
> +     *    - Handle multiple-value headers better. (sort them?)
> +     *    - Handle Case in-sensitive Values better.
> +     *        This isn't the end of the world, since it just lowers the 
> cache
> +     *        hit rate, but it would be nice to fix.
> +     *
> +     * The majority are case insenstive if they are values (encoding etc).
> +     * Most of rfc2616 is case insensitive on header contents.
> +     *
> +     * So the better solution may be to identify headers which should be
> +     * treated case-sensitive?
> +     *  HTTP URI's (3.2.3) [host and scheme are insensitive]
> +     *  HTTP method (5.1.1)
> +     *  HTTP-date values (3.3.1)
> +     *  3.7 Media Types [exerpt]
> +     *     The type, subtype, and parameter attribute names are case-
> +     *     insensitive. Parameter values might or might not be 
> case-sensitive,
> +     *     depending on the semantics of the parameter name.
> +     *  4.20 Except [exerpt]
> +     *     Comparison of expectation values is case-insensitive for 
> unquoted
> +     *     tokens (including the 100-continue token), and is 
> case-sensitive for
> +     *     quoted-string expectation-extensions.
> +     *
> +     *  XX we are currently concatenating the values. With some 
> puzzling one could
> +     *     set a header value which looks like the 'next/previous' 
> value - and hence
> +     *     cause a value which is not the 'real' one. This may be a 
> security risk.
> +     *     Ideally we should use a special value which cannot occur in 
> a header
> +     *     legally (or is escaped/wacked). Given that (at least) a 
> Header cannot
> +     *     contain a space or ':' - may be an idea to insert those.
> +     */

I currently do not understand your worries here. Could you please explain this
in more detail?



> -    /* Parse the vary header and dump those fields from the headers_in. */
> -    /* FIXME: Make call to the same thing cache_select calls to crack 
> Vary. */
> -    if (r->headers_in) {
> +    /* Parse the vary header and dump those fields from the headers_in.
> +     * But only do so if needed - i.e. there is a Vary - and then limit
> +     * it to the headers needed to validate the Vary.
> +     */

See my comment above regarding vary filtering.

> +    if (r->headers_in && varray) {
>          apr_table_t *headers_in;
> -
> -        headers_in = ap_cache_cacheable_hdrs_out(r->pool, r->headers_in,
> -                                                 r->server);
> +        headers_in = ap_cache_cacheable_hdrs_in(r, varray);
> +        if (headers_in) {
>          rv = store_table(dobj->hfd, headers_in);
>          if (rv != APR_SUCCESS) {
> +                 apr_file_close(dobj->hfd); /* flush and close */
> +                 apr_file_remove(dobj->tempfile, r->pool);
>              return rv;
>          }
>      }
> +    }

Regards

RĂ¼diger

Mime
View raw message