Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Message-ID: <3D732519.4050601@apache.org>
Date: Mon, 02 Sep 2002 01:45:13 -0700
From: Brian Pane <brianp@apache.org>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
 rv:1.1) Gecko/20020826
MIME-Version: 1.0
To: dev@httpd.apache.org
Subject: [PATCH] mod_cache: support caching of streamed responses
Content-Type: multipart/mixed;
 boundary="------------020007030703070505020500"

This is a multi-part message in MIME format.
--------------020007030703070505020500
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

This patch allows mod_cache to cache a response that isn't
all contained in the first brigade sent through cache_in_filter().
(This scenario happens, for example, with reverse-proxy requests,
CGIs, and various app server connectors.)

Can someone familiar with the caching code please scrutinize
the patch and send comments?  There's one known problem with
the code (details below), but it's otherwise ready for review.

The way it works is:

  * If the first call to cache_in_filter() includes an EOS,
    the operation is unchanged: put the response in the cache (
    assuming it meets all the other cacheability criteria) and
    pass the brigade to the next filter.

  * If the brigade passed to cache_in_filter() doesn't have an
    EOS, though, the filter now makes a copy of the brigade and
    passes the original on to the next filter.  In subsequent
    calls, cache_in_filter() continues buffering up copies of
    the buckets while streaming the response on to the next
    filter in the chain.  Once it finally sees an EOS, it
    concatenates the setaside buckets into a single brigade
    for storage in the cache.  It then streams the last bit
    of the output to the next filter.

  * If the response contains any bucket with length==-1 (an
    unread pipe bucket, for example), the filter doesn't attempt
    to cache it.  (It's probably possible to add support for
    this in the future.)

The one thing that's missing is a check to avoid setting aside
too much data.  If the total size of the setaside buckets would
exceed the maximum object size for the cache, cache_in_filter()
should immediately discard the setaside buckets and give up on
caching the response.  What's the right way to implement this
check?  It looks like the max object size is a property of each
specific cache implementation (mod_mem_cache, mod_disk_cache).
The first solution that comes to mind is to make each cache
implementation provide a callback function that says whether
it's willing to cache an object of size X.  Is there a cleaner
solution?

-Brian


--------------020007030703070505020500
Content-Type: text/plain;
 name="cache_patch.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="cache_patch.txt"

Index: modules/experimental/mod_cache.h
===================================================================
RCS file: /home/cvs/httpd-2.0/modules/experimental/mod_cache.h,v
retrieving revision 1.31
diff -u -r1.31 mod_cache.h
--- modules/experimental/mod_cache.h	27 Aug 2002 19:22:45 -0000	1.31
+++ modules/experimental/mod_cache.h	2 Sep 2002 08:06:19 -0000
@@ -237,6 +237,11 @@
     int fresh;				/* is the entitey fresh? */
     cache_handle_t *handle;		/* current cache handle */
     int in_checked;			/* CACHE_IN must cache the entity */
+    apr_bucket_brigade *saved_brigade;  /* copy of partial response */
+    apr_off_t saved_size;               /* length of saved_brigade */
+    apr_time_t exp;                     /* expiration */
+    apr_time_t lastmod;                 /* last-modified time */
+    cache_info *info;                   /* current cache info */
 } cache_request_rec;
 
 
Index: modules/experimental/mod_cache.c
===================================================================
RCS file: /home/cvs/httpd-2.0/modules/experimental/mod_cache.c,v
retrieving revision 1.55
diff -u -r1.55 mod_cache.c
--- modules/experimental/mod_cache.c	1 Sep 2002 23:50:42 -0000	1.55
+++ modules/experimental/mod_cache.c	2 Sep 2002 08:06:20 -0000
@@ -430,6 +430,7 @@
     void *scache = r->request_config;
     cache_request_rec *cache =
         (cache_request_rec *) ap_get_module_config(scache, &cache_module);
+    apr_bucket *split_point = NULL;
 
 
     ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, f->r->server,
@@ -450,6 +451,16 @@
         ap_set_module_config(r->request_config, &cache_module, cache);
     }
 
+/* If we've previously processed and set aside part of this
+ * response, skip the cacheability checks
+ */
+if (cache->saved_brigade != NULL) {
+    exp = cache->exp;
+    lastmod = cache->lastmod;
+    info = cache->info;
+}
+else {
+
     /*
      * Pass Data to Cache
      * ------------------
@@ -579,6 +590,7 @@
         return ap_pass_brigade(f->next, in);
     }
     cache->in_checked = 1;
+} /* if cache->saved_brigade==NULL */
 
     /* Set the content length if known.  We almost certainly do NOT want to
      * cache streams with unknown content lengths in the in-memory cache.
@@ -599,6 +611,7 @@
              */
             apr_bucket *e;
             int all_buckets_here=0;
+            int unresolved_length = 0;
             size=0;
             APR_BRIGADE_FOREACH(e, in) {
                 if (APR_BUCKET_IS_EOS(e)) {
@@ -606,6 +619,7 @@
                     break;
                 }
                 if (APR_BUCKET_IS_FLUSH(e)) {
+                    unresolved_length = 1;
                     continue;
                 }
                 if (e->length < 0) {
@@ -615,7 +629,68 @@
             }
 
             if (!all_buckets_here) {
-                size = -1;
+                if (unresolved_length) {
+                    /* There was a bucket of unknown length,
+                     * probably a pipe or socket bucket, in
+                     * the brigade.  Give up on caching this
+                     * response.
+                     */
+                    if (cache->saved_brigade != NULL) {
+                        apr_brigade_destroy(cache->saved_brigade);
+                        cache->saved_brigade = NULL;
+                        cache->saved_size = 0;
+                    }
+                    ap_remove_output_filter(f);
+                    return ap_pass_brigade(f->next, in);
+                }
+
+                /* XXX We need to check if the saved brigade size
+                 * plus the size of the buckets just scanned (which
+                 * we're about to add to the saved brigade) is too
+                 * large for the cache.
+                 */
+
+                /* Add a copy of the new brigade's buckets to the
+                 * saved brigade.  The reason for the copy is so
+                 * that we can output the new buckets immediately,
+                 * rather than having to buffer up the entire
+                 * response before sending anything.
+                 */
+                if (cache->saved_brigade == NULL) {
+                    cache->saved_brigade =
+                        apr_brigade_create(r->pool,
+                                           r->connection->bucket_alloc);
+                    cache->exp = exp;
+                    cache->lastmod = lastmod;
+                    cache->info = info;
+                }
+                APR_BRIGADE_FOREACH(e, in) {
+                    apr_bucket *copy;
+                    apr_bucket_copy(e, &copy);
+                    APR_BRIGADE_INSERT_TAIL(cache->saved_brigade, copy);
+                }
+                cache->saved_size += size;
+                ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r->server,
+                             "cache: No length yet, setting aside "
+                             "content for url: %s", url);
+
+                return ap_pass_brigade(f->next, in);
+            }
+            else {
+                /* Now that we've seen an EOS, it's appropriate
+                 * to try caching the response.  If any content
+                 * has been copied into cache->saved_brigade in
+                 * previous passes through this filter, the
+                 * content placed in the cache must be the
+                 * concatenation of the saved brigade and the
+                 * current brigade.
+                 */
+                if (cache->saved_brigade != NULL) {
+                    split_point = APR_BRIGADE_FIRST(in);
+                    APR_BRIGADE_CONCAT(cache->saved_brigade, in);
+                    in = cache->saved_brigade;
+                    size += cache->saved_size;
+                }
             }
         }
     }
@@ -658,6 +733,11 @@
     if (rv != OK) {
         /* Caching layer declined the opportunity to cache the response */
         ap_remove_output_filter(f);
+        if (split_point) {
+            apr_bucket_brigade *already_sent = in;
+            in = apr_brigade_split(in, split_point);
+            apr_brigade_destroy(already_sent);
+        }
         return ap_pass_brigade(f->next, in);
     }
 
@@ -753,6 +833,11 @@
     }
     if (rv != APR_SUCCESS) {
         ap_remove_output_filter(f);
+    }
+    if (split_point) {
+        apr_bucket_brigade *already_sent = in;
+        in = apr_brigade_split(in, split_point);
+        apr_brigade_destroy(already_sent);
     }
     return ap_pass_brigade(f->next, in);
 }

--------------020007030703070505020500--