Return-Path: Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: (qmail 71951 invoked from network); 7 Aug 2005 22:45:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 7 Aug 2005 22:45:38 -0000 Received: (qmail 20760 invoked by uid 500); 7 Aug 2005 22:45:25 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 20733 invoked by uid 500); 7 Aug 2005 22:45:25 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 20720 invoked by uid 99); 7 Aug 2005 22:45:25 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Aug 2005 15:45:25 -0700 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=NO_REAL_NAME X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [194.25.134.83] (HELO mailout07.sul.t-online.com) (194.25.134.83) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Aug 2005 15:45:15 -0700 Received: from fwd28.aul.t-online.de by mailout07.sul.t-online.com with smtp id 1E1ttT-0008EP-01; Mon, 08 Aug 2005 00:45:23 +0200 Received: from cerberus.heimnetz.de (ZXbBS2ZvreRk4z9EGYQQ2ldU2vw9VlX2aiR4AjbeG1X7bQVkZ-O76f@[80.141.214.247]) by fwd28.sul.t-online.de with esmtp id 1E1ttM-2GhGUK0; Mon, 8 Aug 2005 00:45:16 +0200 Received: from [192.168.2.4] (euler.heimnetz.de [192.168.2.4]) by cerberus.heimnetz.de (Postfix on SuSE Linux 7.0 (i386)) with ESMTP id 84E9D1721C for ; Mon, 8 Aug 2005 00:45:16 +0200 (CEST) Message-ID: <42F68F01.5020508@gmx.de> Date: Mon, 08 Aug 2005 00:45:21 +0200 From: r.pluem@t-online.de User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050716 X-Accept-Language: de, en, de-de, en-gb MIME-Version: 1.0 To: dev@httpd.apache.org Subject: Re: [PATCH] fix incorrect 304's responses when cache is unwritable References: <20050807164843.GA22232@stdlib.net> <42F66813.3020606@gmx.de> <20050807212924.GA24773@stdlib.net> In-Reply-To: <20050807212924.GA24773@stdlib.net> X-Enigmail-Version: 0.90.2.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 8bit X-ID: ZXbBS2ZvreRk4z9EGYQQ2ldU2vw9VlX2aiR4AjbeG1X7bQVkZ-O76f X-TOI-MSGID: 647c1edc-cb8f-4ba4-b012-03e409bca2c5 X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Colm MacCarthaigh wrote: > On Sun, Aug 07, 2005 at 09:59:15PM +0200, r.pluem@t-online.de wrote: > >>As you already mentioned the remove_url implementation >>of mod_disk_cache is currently an empty dummy :-). > > > I've been thinking about that, but it's not entirely as easy as it first > seems, or indeed as htcacheclean wrongly assumes. > > unlinking the data/header files is not good enough, directories are also > resources, and consume space in the inode table. Also, most filesystems > slow down as the ammount of hard-links in any directory increases. Indeed a very valid point. So, yes as far as possible all directories should be removed, when the header and the data files get removed. But additionally my experience with some filesystem types is that removing files from a directory is not enough to speed things up, because the directories itself do not shrink even if entries get removed from the directory (I noticed this behaviour with ext3 on Linux and ufs / Veritas FS on Solaris, wheras reiserfs shrinks). This could pose a problem to the cache root where all the temporary data files get created before being moved to the correct hashed directory. [..cut..] > > Because of Windows and non-standard filesystems like AFS (read the GNU > find(1) manpage section on -noleaf) assumptions about the directory link > counts can't be made, which means a full-scale readdir() and directory > traversal to remove the cache entry. This can be pretty a prettly slow > operation. Possibly slow enough to merit implementing it in an > APR_HOOK_REALLY_LAST hook, so that it can avoid slowing down the request > serving. Is a traversal really needed? What about going back the full path of the header / data file to the cache root and removing each component on the way by calling apr_dir_remove on each component until it fails? [..cut..] >> >>I would really appreciate if you find some time to review my patch. > > > I'm not a committer, so my review is only informational. But I'm > familiar enough with the cache mode, I've been running and patching it > for my own purposes for years, so I've taken a look. I also appreciate comments from non-commiters, as - this will (hopefully) draw the attention of the commiters. - improves the patch > > I'm not really sure what it is aiming to do in relation to 404's. 404's > are never saved to the cache in the first place, the check in > mod_cache.c sorts that out; > > if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE > && r->status != HTTP_MULTIPLE_CHOICES > && r->status != HTTP_MOVED_PERMANENTLY > && r->status != HTTP_NOT_MODIFIED) { > /* RFC2616 13.4 we are allowed to cache 200, 203, 206, 300, 301 or 410 > * We don't cache 206, because we don't (yet) cache partial responses. > * We include 304 Not Modified here too as this is the origin server > * telling us to serve the cached copy. > */ > reason = apr_psprintf(p, "Response status %d", r->status); > } > > Are 404's being served incorrectly in some circumstances? You are right that 404's do not get cached. But if a cached resource vanishes on the backend the cache entry is not removed. This - fills up the cache - leads to the delivery of the cached resource when the request does not force mod_cache to revalidate the entry. This is even the case *if* there had been a request before which forced mod_cache to revalidate the cache entry and the revalidation found out that the resource has vanished on the backend. > > In any case; > > Your remove_url code in mod_disk_cache takes the approach of just > unlinking the data and header file I discussed above. But well, since > so does htcacheclean, that's no reflection on the quality of the patch. See my comments above. Your points are very valid. As I do not want to put too much into a single patch I currently do not intend to adjust this, but I definitely will investigate this for a second step. > > I don't think the cache_removal_url filter is neccessary, the > cache_save_filter already has a lot of code in place for handling > the case of a stale cache handle where I think it is better placed. It is needed because: - In the case of an internal Apache 404 error page the content filter chain is not run (especially not CACHE_SAVE_FILTER). This is the reason why cache_removal_url is a protocol filter. - In the case of an user specified error page with ErrorDocument the CACHE_SAVE_FILTER is run with the wrong request (the one that belongs to the custom error page, not the one of the original request). [..cut..] Regards R�diger