httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niklas Edmundsson <>
Subject mod_disk_cache patch, preview edition (was: new cache arch)
Date Tue, 02 May 2006 12:03:12 GMT
On Tue, 2 May 2006, Graham Leggett wrote:

>> I've been hacking on mod_disk_cache to make it:
>> * Only store one set of data when one uncached item is accessed
>>    simultaneously (currently all requests cache the file and the last
>>    finished cache process is "wins").
>> * Don't wait until the whole item is cached, reply while caching
>>    (currently it stalls).
>> * Don't block the requesting thread when requestng a large uncached
>>    item, cache in the background and reply while caching (currently it
>>    stalls).
> This is great, in doing this you've been solving a proxy bug that was
> first reported in 1998 :).

OK. Stuck in the "File under L for Later" pile? ;)

> The only things to be careful of is for Cache-Control: no-cache and
> friends to be handled gracefully (the partially cached file should be
> marked as "delete-me" so that the current request creates a new cache file
> / no cache file. Existing running downloads should be unaffected by
> this.), and for backend failures (either a timeout or a premature socket
> close) to cause the cache entry to be invalidated and deleted.

I haven't changed the handling of this, so any bugs in this regard 
shouldn't be my fault at least ;)

Regarding partially cached files, it understands when caching a file 
has failed and so on.

>> * More or less atomic operations, so caching headers and data in
>>    separate files gets very messy if you want to keep consistency.
> Keep in mind that HTTP/1.1 compliance requires that the headers be
> updatable without changing the body.

They are. It seek():s to an offset where the body is stored so 
headers can be updated as long as they don't grow too much.

>> * You can't use tempfiles since you want to be able to figure out
>>    where the data is to be able to reply while caching.
>> * You want to know the size of the data in order to tell when you're
>>    done (ie the current size of a file isn't necessarily the real size
>>    of the body since it might be caching while we're reading it).
> The cache already wants to know the size of the data so that it can decide
> whether it's prepared to try and cache the file in the first place, so in
> theory this should not be a problem.

The need-size-issue goes for retrievals as well.

You also have the "size unknown right now" issue, which this patch 
solves by writing a header with the size -1 and then updating it when 
the size is known.

>> In any case the patch is more or less finished, independent testing
>> and auditing haven't been done yet but I can submit a preliminary
>> jumbo-patch if people are interested in having a look at it now.
> Post it, people can take a look.

OK. It's attached. It has only had mild testing using the worker mpm 
with mmap enabled, it needs a bit more testing and auditing before 
trusting it too hard.

Note that this patch fixes a whole slew of other issues along the way, 
the most notable ones being LFS on 32bit arch, don't eat all your 
32bit memory/address space when caching a huge files, provide 
r->filename so %f in LogFormat works, and other smaller issues.

  Niklas Edmundsson, Admin @ {acc,hpc2n}      |
  I am Zirofsky of Borg. I will reassimilate Alaska and Finland.
View raw message