httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Issac Goldstand <>
Subject Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
Date Wed, 20 Sep 2006 13:33:36 GMT

Graham Leggett wrote:
> Niklas Edmundsson wrote:
>> However, I don't see how you can do a lockless design with multiple 
>> files and an index that can do:
>> * Clients read from the cache as files are being cached.
>> * Only one session caches the same file.
>> * Header/Body updates.
>> * No index/files out-of-sync issues. Ever.
> Thinking about this some more I do see a race during purging - a cache 
> thread could read the header, the purge deletes header and body, and 
> then the cache thread reads the body, and interprets the missing body 
> as "the body is still coming".
> One possible (and reasonably simple) solution would be to cache the 
> header and body in a unique directory - the directory name becomes the 
> key, and the entry is either cached completely / still being cached if 
> the directory exists. This assumes it's possible to atomically delete 
> directories.
I don't understand why bother getting so complex.  Touch/truncate the 
body file when storing the header, and then a missing body means things 
have gone amok - retry the request.  Conversely, a zero-length, or < C-L 
body length means another thread is working on the body.

> Another option is to version the filename of the body based on a key 
> in the header. In other words, in the header, called <key>.header, is 
> a version number <timestamp>, meaning there should be a body called 
> <key>.<timestamp>.body. A replacement cached entry therefore cannot 
> stomp on what pre existing threads are doing. If the body file is 
> created first, before the header file, then a non existent body file 
> means "this entry has been invalidated, try the request again".
> There is an assumption that <timestamp> is fine grained enough to be 
> unique.
> You're right, this is a tricky one, but there is a solution out there.
Maybe we're attacking the problem from the wrong angle.  Rather than 
modifying mod_cache, modify the garbage-collector (e.g., htcacheclean).  
Do a two pass cleanup.  The first pass is a data-store transversal pass 
which decides what to remove.  It immediately purges the header file, 
and stores the entity key (or filename, or whatever it needs to 
re-access the entity) in a list.  Once the first pass finishes, a second 
pass is made leisurely cleaning up all of the entities that are still 
missing their header files (that way, if a mod_cache thread re-caches 
the entity, we won't purge it).

That should be a safe solution, provided that the time taken to perform 
the first pass is shorter than the time between opening the header and 
body files.  That should normally be the case, unless someone can come 
up with a reasonable case where it wouldn't be so?


View raw message