httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: [PATCH] fix incorrect 304's responses when cache is unwritable
Date Sun, 07 Aug 2005 22:45:21 GMT

Colm MacCarthaigh wrote:
> On Sun, Aug 07, 2005 at 09:59:15PM +0200, wrote:
>>As you already mentioned the remove_url implementation
>>of mod_disk_cache is currently an empty dummy :-).
> I've been thinking about that, but it's not entirely as easy as it first
> seems, or indeed as htcacheclean wrongly assumes.
> unlinking the data/header files is not good enough, directories are also
> resources, and consume space in the inode table. Also, most filesystems
> slow down as the ammount of hard-links in any directory increases. 

Indeed a very valid point. So, yes as far as possible all directories should be
removed, when the header and the data files get removed.
But additionally my experience with some filesystem types is that removing
files from a directory is not enough to speed things up, because the directories
itself do not shrink even if entries get removed from the directory
(I noticed this behaviour with ext3 on Linux and ufs / Veritas FS on Solaris, wheras
reiserfs shrinks). This could pose a problem to the cache root where all the
temporary data files get created before being moved to the correct hashed directory.


> Because of Windows and non-standard filesystems like AFS (read the GNU
> find(1) manpage section on -noleaf) assumptions about the directory link
> counts can't be made, which means a full-scale readdir() and directory
> traversal to remove the cache entry. This can be pretty a prettly slow
> operation. Possibly slow enough to merit implementing it in an
> APR_HOOK_REALLY_LAST hook, so that it can avoid slowing down the request
> serving.

Is a traversal really needed? What about going back the full path of the
header / data file to the cache root and removing each component on the
way by calling apr_dir_remove on each component until it fails?


>>I would really appreciate if you find some time to review my patch.
> I'm not a committer, so my review is only informational. But I'm
> familiar enough with the cache mode, I've been running and patching it
> for my own purposes for years, so I've taken a look.

I also appreciate comments from non-commiters, as

 - this will (hopefully) draw the attention of the commiters.
 - improves the patch

> I'm not really sure what it is aiming to do in relation to 404's.  404's
> are never saved to the cache in the first place, the check in
> mod_cache.c sorts that out;
>     if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE
>         && r->status != HTTP_MULTIPLE_CHOICES
>         && r->status != HTTP_MOVED_PERMANENTLY
>         && r->status != HTTP_NOT_MODIFIED) {
>         /* RFC2616 13.4 we are allowed to cache 200, 203, 206, 300, 301 or 410
>          * We don't cache 206, because we don't (yet) cache partial responses.
>          * We include 304 Not Modified here too as this is the origin server
>          * telling us to serve the cached copy.
>          */
>         reason = apr_psprintf(p, "Response status %d", r->status);
>     }
> Are 404's being served incorrectly in some circumstances? 

You are right that 404's do not get cached. But if a cached resource vanishes on the backend
the cache entry is not removed. This

- fills up the cache
- leads to the delivery of the cached resource when the request does not force mod_cache
  to revalidate the entry. This is even the case *if* there had been a request before which
  forced mod_cache to revalidate the cache entry and the revalidation found out that the resource
  has vanished on the backend.

> In any case;
> Your remove_url code in mod_disk_cache takes the approach of just
> unlinking the data and header file I discussed above. But well, since
> so does htcacheclean, that's no reflection on the quality of the patch.

See my comments above. Your points are very valid. As I do not want to put
too much into a single patch I currently do not intend to adjust this, but
I definitely will investigate this for a second step.

> I don't think the cache_removal_url filter is neccessary, the
> cache_save_filter already has a lot of code in place for handling
> the case of a stale cache handle where I think it is better placed.

It is needed because:

- In the case of an internal Apache 404 error page the content filter chain
  is not run (especially not CACHE_SAVE_FILTER). This is the reason why
  cache_removal_url is a protocol filter.

- In the case of an user specified error page with ErrorDocument the CACHE_SAVE_FILTER
  is run with the wrong request (the one that belongs to the custom error page, not
  the one of the original request).




View raw message