httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruediger Pluem <rpl...@apache.org>
Subject Re: Issues with mod_disk_cache and htcacheclean
Date Mon, 05 Jan 2009 14:50:07 GMT


On 01/04/2009 11:09 PM, Neil Gunton wrote:

> 
> All of this brings up a few questions:
> 
> 1. Why does mod_disk_cache generate six levels of subdirectory when
> CacheDirLevels is clearly set to 3? I realize what it's trying to do,

This is more of a documentation bug, than a code bug. The documentation
should clearly state that in the Vary case the depth can be twice as
large as CacheDirLevels said.

> (each page might have many variations and so those variations must be
> differentiated by subdir) but the additional levels cause an exponential
> increase in the number of directories that must be traversed. It seems
> absurd when this causes trouble for a relatively well-specced server.
> Since starting this investigation, I have moved to a completely new
> server, a 4 core Xeon 2.33GHz, with 8 x 10k Raptor SATA drives in
> hardware RAID10 configuration. The performance is excellent, but when I
> tried using mod_disk_cache with CacheDirLevels at 3 and cookie Vary
> headers on, it still could not keep up with pruning. Even simply
> traversing this kind of structure with du is clearly not scalable. Could
> we not have the three main levels of directory, but then have a
> different setting for the number of subdirs below the .vary dirs?
> Usually there is just one file at the leaf of the .vary subdirs, so
> having three additional levels seems like a bit of overkill. We should
> be able to tune the subdir levels to minimize the depth of the cache as
> makes sense.

What information do your cookies contain? Are these session cookies that
are individual to each client? In this case the usage of mod_disk_cache
with Vary Cookies set would be bad. As these responses would be individual
you couldn't reuse the results anyway for other clients, so it would be
the best to leave caching to the individual client caches (e.g. browser caches).
If your cookies are like BACKGROUND=blue for some users and BACKGROUND=red
for other users you should think of incorporating these differences into
the URL's instead of into varying responses.

Regarding the performance you should take a look at the following:

1. Use a separate filesystem for the cache.
2. Ensure that it is mounted with noatime option.
3. Check if you are using the right type of filesystem for this job. If the
   size of the individual cache files is rather small reiserfs can be much
   faster then ext3 if I remember correctly.

> 2. Why does htcacheclean not keep the cache at the stated size limit? If
> you say -l100M and then do a du and it says 200M, then that is
> counter-intuitive, and actually wrong in real terms. It gets worse with
> the larger caches - when I had 3 levels and cookie Vary headers on, the
> limit for htcacheclean was 1000M, but the cache would grow to 3GB and up.

Again, this is an issue with the documentation. In fact htcacheclean does
not limit the size of the cache at all. It can grow indefinitely.
It only ensures that the size of the cache is being reduced back at least
to the given limit after it ran. The size of the cache is defined as the
sum of all filesizes in the cache. It does not consider the disk usage of
these files which can be larger and it also doesn't take the sizes of the
directories into account. I am not sure if a du like measurement of the
cache size would be implementable in a platform independent way, but I
may be wrong here.

> 3. Why are .header files left over by htcachelean when it has deleted
> the .vary subdirectory? Is this something like a memory leak, but with
> files? I would have thought that if the cached content (.data) file has
> gone away, then why bother keeping the .header file around. It clogs up
> the cache directory and makes traversing the tree more work. If it's
> kept for 304 "unchanged" responses then I can understand that, but then
> why do these files still seem to pile up even after the related page
> would have clearly expired anyway? Surely better to just delete them
> when the .vary subdir is deleted. In any case, I didn't notice the
> .header files being left over when the Vary header was disabled, so I
> think this might be a straightforward "leak" when using Vary.

This seems to be a bug. Can you please try if the following patch fixes this?

Index: support/htcacheclean.c
===================================================================
--- support/htcacheclean.c      (Revision 731535)
+++ support/htcacheclean.c      (Arbeitskopie)
@@ -248,6 +248,7 @@
 {
     char *nextpath;
     apr_pool_t *p;
+    char *cache_root_path;

     if (dryrun) {
         return;
@@ -262,6 +263,49 @@
     nextpath = apr_pstrcat(p, path, "/", basename, CACHE_DATA_SUFFIX, NULL);
     apr_file_remove(nextpath, p);

+    if (deldirs && (apr_filepath_get(&cache_root_path, 0, p) == APR_SUCCESS))
{
+        apr_status_t rc;
+        char *q;
+        char *dir;
+        char *slash;
+        char *dot;
+
+        dir = apr_pstrdup(p, path);
+
+        /*
+         * now walk our way back to the cache root, delete everything
+         * in the way as far as possible
+         *
+         * Note: due to the way we constructed the file names in
+         * process_dir, we are guaranteed that the
+         * cache_root_path is suffixed by at least one '/' which will be
+         * turned into a terminating null by this loop.  Therefore,
+         * we won't either delete or go above our cache root.
+         */
+        for (q = dir + strlen(cache_root_path); *q ; ) {
+            rc = apr_dir_remove(dir, p);
+            delcount++;
+            if (rc != APR_SUCCESS && !APR_STATUS_IS_ENOENT(rc)) {
+                break;
+            }
+            slash = strrchr(q, '/');
+            *slash = '\0';
+            /*
+             * Check if we just deleted a vary directory. If we did, the
+             * corresponding header file is of no use anymore. So delete
+             * it.
+             */
+            dot = strrchr(slash + 1, '.');
+            if (dot && (strcmp(dot + 1, CACHE_VDIR_SUFFIX) == 0)) {
+                *dot = '\0';
+                nextpath = apr_pstrcat(p, dir, "/", slash + 1,
+                                       CACHE_HEADER_SUFFIX, NULL);
+                apr_file_remove(nextpath, p);
+                delcount++;
+            }
+        }
+    }
+
     apr_pool_destroy(p);

     if (benice) {


> 4. Will I be causing any potential problems for Apache by my deleting
> the leftover .header files myself (ones which have no corresponding
> .vary subdir)? Does that cause apache or htcacheclean to have potential
> issues if you do this while they are running? If they are junk then I
> can't see it being a problem, but it's unclear currently if they are
> actually used or not.

IMHO not. The patch above does the same.

Regards

RĂ¼diger

Mime
View raw message