Return-Path: Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 15742 invoked by uid 500); 2 Sep 2002 08:45:01 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 15725 invoked from network); 2 Sep 2002 08:45:01 -0000 Message-ID: <3D732519.4050601@apache.org> Date: Mon, 02 Sep 2002 01:45:13 -0700 From: Brian Pane User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1) Gecko/20020826 X-Accept-Language: en-us, en MIME-Version: 1.0 To: dev@httpd.apache.org Subject: [PATCH] mod_cache: support caching of streamed responses Content-Type: multipart/mixed; boundary="------------020007030703070505020500" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N This is a multi-part message in MIME format. --------------020007030703070505020500 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit This patch allows mod_cache to cache a response that isn't all contained in the first brigade sent through cache_in_filter(). (This scenario happens, for example, with reverse-proxy requests, CGIs, and various app server connectors.) Can someone familiar with the caching code please scrutinize the patch and send comments? There's one known problem with the code (details below), but it's otherwise ready for review. The way it works is: * If the first call to cache_in_filter() includes an EOS, the operation is unchanged: put the response in the cache ( assuming it meets all the other cacheability criteria) and pass the brigade to the next filter. * If the brigade passed to cache_in_filter() doesn't have an EOS, though, the filter now makes a copy of the brigade and passes the original on to the next filter. In subsequent calls, cache_in_filter() continues buffering up copies of the buckets while streaming the response on to the next filter in the chain. Once it finally sees an EOS, it concatenates the setaside buckets into a single brigade for storage in the cache. It then streams the last bit of the output to the next filter. * If the response contains any bucket with length==-1 (an unread pipe bucket, for example), the filter doesn't attempt to cache it. (It's probably possible to add support for this in the future.) The one thing that's missing is a check to avoid setting aside too much data. If the total size of the setaside buckets would exceed the maximum object size for the cache, cache_in_filter() should immediately discard the setaside buckets and give up on caching the response. What's the right way to implement this check? It looks like the max object size is a property of each specific cache implementation (mod_mem_cache, mod_disk_cache). The first solution that comes to mind is to make each cache implementation provide a callback function that says whether it's willing to cache an object of size X. Is there a cleaner solution? -Brian --------------020007030703070505020500 Content-Type: text/plain; name="cache_patch.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="cache_patch.txt" Index: modules/experimental/mod_cache.h =================================================================== RCS file: /home/cvs/httpd-2.0/modules/experimental/mod_cache.h,v retrieving revision 1.31 diff -u -r1.31 mod_cache.h --- modules/experimental/mod_cache.h 27 Aug 2002 19:22:45 -0000 1.31 +++ modules/experimental/mod_cache.h 2 Sep 2002 08:06:19 -0000 @@ -237,6 +237,11 @@ int fresh; /* is the entitey fresh? */ cache_handle_t *handle; /* current cache handle */ int in_checked; /* CACHE_IN must cache the entity */ + apr_bucket_brigade *saved_brigade; /* copy of partial response */ + apr_off_t saved_size; /* length of saved_brigade */ + apr_time_t exp; /* expiration */ + apr_time_t lastmod; /* last-modified time */ + cache_info *info; /* current cache info */ } cache_request_rec; Index: modules/experimental/mod_cache.c =================================================================== RCS file: /home/cvs/httpd-2.0/modules/experimental/mod_cache.c,v retrieving revision 1.55 diff -u -r1.55 mod_cache.c --- modules/experimental/mod_cache.c 1 Sep 2002 23:50:42 -0000 1.55 +++ modules/experimental/mod_cache.c 2 Sep 2002 08:06:20 -0000 @@ -430,6 +430,7 @@ void *scache = r->request_config; cache_request_rec *cache = (cache_request_rec *) ap_get_module_config(scache, &cache_module); + apr_bucket *split_point = NULL; ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, f->r->server, @@ -450,6 +451,16 @@ ap_set_module_config(r->request_config, &cache_module, cache); } +/* If we've previously processed and set aside part of this + * response, skip the cacheability checks + */ +if (cache->saved_brigade != NULL) { + exp = cache->exp; + lastmod = cache->lastmod; + info = cache->info; +} +else { + /* * Pass Data to Cache * ------------------ @@ -579,6 +590,7 @@ return ap_pass_brigade(f->next, in); } cache->in_checked = 1; +} /* if cache->saved_brigade==NULL */ /* Set the content length if known. We almost certainly do NOT want to * cache streams with unknown content lengths in the in-memory cache. @@ -599,6 +611,7 @@ */ apr_bucket *e; int all_buckets_here=0; + int unresolved_length = 0; size=0; APR_BRIGADE_FOREACH(e, in) { if (APR_BUCKET_IS_EOS(e)) { @@ -606,6 +619,7 @@ break; } if (APR_BUCKET_IS_FLUSH(e)) { + unresolved_length = 1; continue; } if (e->length < 0) { @@ -615,7 +629,68 @@ } if (!all_buckets_here) { - size = -1; + if (unresolved_length) { + /* There was a bucket of unknown length, + * probably a pipe or socket bucket, in + * the brigade. Give up on caching this + * response. + */ + if (cache->saved_brigade != NULL) { + apr_brigade_destroy(cache->saved_brigade); + cache->saved_brigade = NULL; + cache->saved_size = 0; + } + ap_remove_output_filter(f); + return ap_pass_brigade(f->next, in); + } + + /* XXX We need to check if the saved brigade size + * plus the size of the buckets just scanned (which + * we're about to add to the saved brigade) is too + * large for the cache. + */ + + /* Add a copy of the new brigade's buckets to the + * saved brigade. The reason for the copy is so + * that we can output the new buckets immediately, + * rather than having to buffer up the entire + * response before sending anything. + */ + if (cache->saved_brigade == NULL) { + cache->saved_brigade = + apr_brigade_create(r->pool, + r->connection->bucket_alloc); + cache->exp = exp; + cache->lastmod = lastmod; + cache->info = info; + } + APR_BRIGADE_FOREACH(e, in) { + apr_bucket *copy; + apr_bucket_copy(e, ©); + APR_BRIGADE_INSERT_TAIL(cache->saved_brigade, copy); + } + cache->saved_size += size; + ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r->server, + "cache: No length yet, setting aside " + "content for url: %s", url); + + return ap_pass_brigade(f->next, in); + } + else { + /* Now that we've seen an EOS, it's appropriate + * to try caching the response. If any content + * has been copied into cache->saved_brigade in + * previous passes through this filter, the + * content placed in the cache must be the + * concatenation of the saved brigade and the + * current brigade. + */ + if (cache->saved_brigade != NULL) { + split_point = APR_BRIGADE_FIRST(in); + APR_BRIGADE_CONCAT(cache->saved_brigade, in); + in = cache->saved_brigade; + size += cache->saved_size; + } } } } @@ -658,6 +733,11 @@ if (rv != OK) { /* Caching layer declined the opportunity to cache the response */ ap_remove_output_filter(f); + if (split_point) { + apr_bucket_brigade *already_sent = in; + in = apr_brigade_split(in, split_point); + apr_brigade_destroy(already_sent); + } return ap_pass_brigade(f->next, in); } @@ -753,6 +833,11 @@ } if (rv != APR_SUCCESS) { ap_remove_output_filter(f); + } + if (split_point) { + apr_bucket_brigade *already_sent = in; + in = apr_brigade_split(in, split_point); + apr_brigade_destroy(already_sent); } return ap_pass_brigade(f->next, in); } --------------020007030703070505020500--