httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Gunton <n...@nilspace.com>
Subject Re: Issues with mod_disk_cache and htcacheclean
Date Mon, 05 Jan 2009 22:17:26 GMT
Ruediger Pluem wrote:
> What information do your cookies contain? Are these session cookies that
> are individual to each client? In this case the usage of mod_disk_cache
> with Vary Cookies set would be bad. As these responses would be individual
> you couldn't reuse the results anyway for other clients, so it would be
> the best to leave caching to the individual client caches (e.g. browser caches).
> If your cookies are like BACKGROUND=blue for some users and BACKGROUND=red
> for other users you should think of incorporating these differences into
> the URL's instead of into varying responses.

I use two cookies currently - one for user logins and one for options. 
They are independent - people browsing the site may have either, or 
both, or neither set.

I need to cache all dynamically generated content so that the server can 
cope with slashdottings and links from other popular sites where lots of 
people all click on the same link at the same time ("click storms"). 
Such links could go to any page on the site, and so I really need to 
cache almost everything from mod_perl - with the exception of areas of 
the site which are obviously user-specific, such as edit forms, users' 
personal pages and so on. Those are no-cache.

I am very careful about setting expiration times, since with it being a 
dynamic site and all, you don't want too many stale pages. So many of 
the indexes (e.g. list of latest journal updates) have an expiration of 
only 1-3 minutes, while other journal pages have expiration of 12 hours 
or more.

I keep a 'version' field as part of the database records for most 
content on the site, which is incremented whenever an object is edited. 
Then when someone edits a journal, I include a special 'v=xxx' parameter 
in subsequent links to pages on that journal, to differentiate it from 
earlier versions. So the links from the (fast expiring) index pages such 
as forums or journals index will quickly have the new link with the new 
version. This allows me to have extensively cached content while still 
having people see new edits quickly. Thus the cache is fairly high turnover.

The mod_disk_cache works very well, the only issue being keeping the 
cache size under control without making iowait become noticable as a 
result. I have been finding that keeping the limit down to 100M rather 
than 1000M, and making DirCacheLevels 2 rather than 3, and clearing out 
the orphaned .header files, and running htcacheclean and my header 
pruning script every 10 minutes, seems to make the server very 
comfortable - the iowait goes away to unnoticeable levels.

All the app level code here was developed by me. This is a community 
website for bicycle touring journals - www.crazyguyonabike.com. It 
currently sees somewhere north of 100,000 page requests per day, 
according to analog (and that's not including googlebot, which is on 
there constantly). I am very interested in configuring the site to be 
able to run efficiently on one reasonably well-spec'd server. Caching 
dynamic content is a major part of being able to scale well to cope with 
click storms.

> Regarding the performance you should take a look at the following:
> 
> 1. Use a separate filesystem for the cache.
> 2. Ensure that it is mounted with noatime option.
> 3. Check if you are using the right type of filesystem for this job. If the
>    size of the individual cache files is rather small reiserfs can be much
>    faster then ext3 if I remember correctly.

I currently use ext2 with noatime for the main filesystem (including 
cache). I went to ext2 from ext3 because ext3 has extra overhead related 
to keeping the journal (I believe that is the big difference between the 
two these days). Though I do not have numbers, I do seem to have seen 
disk performance increase since going back to ext2. I'm not sure if you 
can install dir_index with ext2 without turning it into ext3 in the 
process, but in any case I don't have dir_index enabled currently.

I was aware of the potential for using other filesystems for the cache, 
and had thought about reiserfs as a possibility. However after I wrote 
to the httpd users list a few weeks back asking about this very issue, I 
got zero responses. I then went to the squid group and asked there too, 
and similarly got zero useful responses. I agree that reiserfs might 
handle many small files better, but I am wary of using that since the 
trial of Hans Reiser - it kind of calls the future of his tool into 
question, unfortunately.

>> 2. Why does htcacheclean not keep the cache at the stated size limit? If
>> you say -l100M and then do a du and it says 200M, then that is
>> counter-intuitive, and actually wrong in real terms. It gets worse with
>> the larger caches - when I had 3 levels and cookie Vary headers on, the
>> limit for htcacheclean was 1000M, but the cache would grow to 3GB and up.
> 
> Again, this is an issue with the documentation. In fact htcacheclean does
> not limit the size of the cache at all. It can grow indefinitely.
> It only ensures that the size of the cache is being reduced back at least
> to the given limit after it ran. The size of the cache is defined as the
> sum of all filesizes in the cache. It does not consider the disk usage of
> these files which can be larger and it also doesn't take the sizes of the
> directories into account. I am not sure if a du like measurement of the
> cache size would be implementable in a platform independent way, but I
> may be wrong here.

Ok, that's fine. You're right, it sounds like a documentation issue.

> This seems to be a bug. Can you please try if the following patch fixes this?

I applied the patch and rebuilt httpd_proxy successfully. The new 
htcacheclean runs ok, but still seems to leave behind the orphan .header 
files. At least, I tried running htcacheclean in single run mode, thus:

htcacheclean -t -p/var/cache/www -l100M

Then I run my prune_cache_headers perl script, and it seems to still 
find a bunch of orphaned .header files to delete. So it doesn't appear 
to have fixed the issue. I did confirm that the patch was applied.

>> 4. Will I be causing any potential problems for Apache by my deleting
>> the leftover .header files myself (ones which have no corresponding
>> .vary subdir)? Does that cause apache or htcacheclean to have potential
>> issues if you do this while they are running? If they are junk then I
>> can't see it being a problem, but it's unclear currently if they are
>> actually used or not.
> 
> IMHO not. The patch above does the same.

Great, thanks - good to know.

Thanks for your help!

Neil

Mime
View raw message