httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colm MacCarthaigh <c...@stdlib.net>
Subject Re: [PATCH] fix incorrect 304's responses when cache is unwritable
Date Sun, 07 Aug 2005 21:29:24 GMT
On Sun, Aug 07, 2005 at 09:59:15PM +0200, r.pluem@t-online.de wrote:
> As you already mentioned the remove_url implementation
> of mod_disk_cache is currently an empty dummy :-).

I've been thinking about that, but it's not entirely as easy as it first
seems, or indeed as htcacheclean wrongly assumes.

unlinking the data/header files is not good enough, directories are also
resources, and consume space in the inode table. Also, most filesystems
slow down as the ammount of hard-links in any directory increases. 

The hash function used by mod_cache is 128-bits in size, which given any
conceivable settings for CacheDirLevels and CacheDirLength is still
going to be more than enough to exhaust the inode table on any rational
filesystem. Considering that one of the main uses of mod_cache is for
mod_proxy users, where there is an infinity of keys (urls) this means
that the filesystem will eventually become unusable unless the
adminstator periodically weighs in an removes empty directories
manually.

Because of Windows and non-standard filesystems like AFS (read the GNU
find(1) manpage section on -noleaf) assumptions about the directory link
counts can't be made, which means a full-scale readdir() and directory
traversal to remove the cache entry. This can be pretty a prettly slow
operation. Possibly slow enough to merit implementing it in an
APR_HOOK_REALLY_LAST hook, so that it can avoid slowing down the request
serving.

An alternative which I think was discussed here is to create a cgid-like
process which deals with this kind of task, aswell as more complex
things like managing just-to-expire files. But that kind of steps on the
toes of the cache_requester SoC work.

Either way, implementing an immediate unlink() on the files, just to be
able to return useful things about the writability of the filesystem,
but leave the more complex directory cleanup until the file has been
served is what I'm thinking of.

For what it's worth though, htcacheclean itself has this massive bug,
and does not do any directory cleanup, so your patch isn't alone in
doing this.

> I had a similar problem with 404 responses, and wrote a patch for this which is
> currently in discussion (attached patch again to this mail):
> 
> http://mail-archives.apache.org/mod_mbox/httpd-dev/200507.mbox/%3c42DF9B4C.5080604@apache.org%3e
> 
> It actually does implement a removal of the files in mod_disk_cache and
> should also handle your problem. If it does not, I am pretty sure that a small modification
> to the patch would do it.
> 
> I would really appreciate if you find some time to review my patch.

I'm not a committer, so my review is only informational. But I'm
familiar enough with the cache mode, I've been running and patching it
for my own purposes for years, so I've taken a look.

I'm not really sure what it is aiming to do in relation to 404's.  404's
are never saved to the cache in the first place, the check in
mod_cache.c sorts that out;

    if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE
        && r->status != HTTP_MULTIPLE_CHOICES
        && r->status != HTTP_MOVED_PERMANENTLY
        && r->status != HTTP_NOT_MODIFIED) {
        /* RFC2616 13.4 we are allowed to cache 200, 203, 206, 300, 301 or 410
         * We don't cache 206, because we don't (yet) cache partial responses.
         * We include 304 Not Modified here too as this is the origin server
         * telling us to serve the cached copy.
         */
        reason = apr_psprintf(p, "Response status %d", r->status);
    }

Are 404's being served incorrectly in some circumstances? 

In any case;

Your remove_url code in mod_disk_cache takes the approach of just
unlinking the data and header file I discussed above. But well, since
so does htcacheclean, that's no reflection on the quality of the patch.

I don't think the cache_removal_url filter is neccessary, the
cache_save_filter already has a lot of code in place for handling
the case of a stale cache handle where I think it is better placed.

The (much smaller) patch I've just submitted, plus your changes to
mod_disk_cache, mod_mem_cache and cache_storage.c would do the same job
with a third of the code :)

-- 
Colm MacCárthaigh                        Public Key: colm+pgp@stdlib.net

Mime
View raw message