httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graham Leggett <minf...@sharp.fm>
Subject Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
Date Sun, 06 Jun 2010 11:35:21 GMT
On 05 Jun 2010, at 6:51 PM, Ruediger Pluem wrote:

>>> IMHO it does not (at least according to the comments and the code
>>> looks like
>>> to follow these):
>>
>> This is only present on trunk, and this needs to be fixed too. The
>> problem we saw was in httpd v2.2.
>
> A similar code is part of 2.2. Hence I am still astonished why you
> see this problem in 2.2.

I think we're talking across purposes, the code in the most recent  
v2.2 branch I am looking at looks like this with the comments stripped:

     if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE
         && r->status != HTTP_MULTIPLE_CHOICES
         && r->status != HTTP_MOVED_PERMANENTLY
         && r->status != HTTP_NOT_MODIFIED) {
         if (exps != NULL || cc_out != NULL) {
// noop
         }
         else {
             reason = apr_psprintf(p, "Response status %d", r->status);
         }
     }

If the request survives to the end of this code with "reason" still  
NULL, the response is still cacheable at this point.

A 206 always gets past the first if, and is then tested against the  
second. If an Expires or a Cache-Control is present (and this is true  
in many cases, and particularly in our case where the news people  
found the bug), the "noop" path is followed, and we reach the end of  
the if statements with reason still NULL. The cache implementation  
then duly incorrectly caches the partial response.

The reason we don't see this problem very often is because we don't  
see a 206 very often at this point in the code. The filter that does  
content ranges sits after the CACHE_SAVE filter, and so under most  
circumstances, the response is still a complete 200 OK when it gets  
here, and the cache caches the full 200 OK response.

If we reverse proxy to an application server of some kind, many  
applications don't consider the range request at all, and return a 200  
OK anyway. Again, the cache does "the right thing" at this point and  
caches the complete response, even if the browser has requested a  
range, the range filter turning the response into a range response  
after the cache has finished caching the full response.

It's only when we reverse proxy to another server that understands and  
responds to range requests itself that we will see a 206 at this  
point, and this isn't a common configuration.

Because the cache doesn't know any better, it passes the 206 back to  
mod_disk_cache. Because mod_disk_cache previously didn't know better  
either, it cached the 206 (this is now fixed). Subsequent ordinary  
requests were handed cached 206 partial responses in return even  
though the browser asked for a normal request, until an end user goes  
"huh", clicks "shift reload" and the cache fixes itself again.

The fix to this problem comes in two parts, the first is for  
mod_disk_cache to say "I explicitly decline to cache range requests, I  
don't yet know how", the second part is to make the comments in  
mod_cache match the code: The comments in mod_cache say that a 206 is  
not cacheable, the code said it is, but only if CC or Expires is  
present. You then fixed this on trunk by banning 206 caching entirely  
for all mod_cache implementations, but this comes at the price of  
removing the option for an implementation to choose to cache a range  
request if it so wished.

> I am currently worried that a fresh cached copy of a 206 response  
> cannot
> be updated accordingly if a full response is requested. That puts  
> the burden
> of this stuff on each provider instead of some general framework  
> solution
> in mod_cache. So I think mod_cache should provide a better framework  
> for
> 206 responses first before providers should implement it.

I don't follow what mod_cache would need to do in this case with  
respect to a 206 that makes a cache implementation require special  
support.

If a cache implementation wanted to, it could cache the 206, marking  
the range in the cache as appropriate. When a full request came along,  
the cache could potentially modify the request to ask for the missing  
range, and then decline the opportunity to serve from cache. The  
backend supplies the missing range, and the cache_save filter saves  
the missing range to complete the request, and inserts the existing  
cached data into the response as a simple file bucket, completing the  
response.

If we did find a way to support a generic 206 mechanism in mod_cache,  
that would be great (assuming it was necessary), but the opposite,  
banning any caching implementation from even attempting to cache a 206  
by enforcing a ban in mod_cache itself isn't ideal, if that makes sense?

Regards,
Graham
--


Mime
View raw message