httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niklas Edmundsson <ni...@acc.umu.se>
Subject Re: mod_cache: store_body() bites off more than it can chew
Date Mon, 06 Sep 2010 12:09:26 GMT
On Mon, 6 Sep 2010, Graham Leggett wrote:

<snip>
>> For those who have forgotten, that's what we do in our 
>> large-file-caching-patchset for mod_disk_cache (hidden as an attachment to 
>> https://issues.apache.org/bugzilla/show_bug.cgi?id=39380 but I should 
>> really get around to upload an up2date version that applies cleanly to the 
>> current 2.2 release). Some of the solutions there aren't really applicable 
>> to httpd proper (mostly workarounds for missing infrastructure), but some 
>> ideas are rather sane (like writing the header files in a single go with an 
>> iovec with null terminated strings instead of crlf-stuff thad needs to be 
>> parsed). Oh, and the design caters for a shared data cache (ftp and rsync 
>> access uses the same cache), which isn't really a priority for something in 
>> httpd proper.
>
> Given that the make-cache-writes-atomic problem requires a change to the data 
> format, it may be useful to look at this now, before v2.4 is baked, which 
> will happen soon.

Indeed.

When at it, it might make sense to replace arch-specific data types 
like int and apr_size_t with apr_int32_t and such. Most people would 
have made the 32/64 bit transition already though, so it might be a 
non-issue.

Another good thing to have would be the filename of the maching 
data/body file. httpd mod_disk_cache hashes this from the URL, but 
there may be smarter ways to do this at cache-time which requires the 
resulting filename to be stored (for example we use dev/inode on plain 
files to reduce data duplication when caching DVD images with dozens 
of known URLs). Size of that file is also good to have, on mismatch 
the cache is out of sync/corrupted (unless the file is being written, 
but then we know enough to start answering the query from cache).

Also we save r->filename to be able to fill it in when replying on a 
query (I think for making logging filenames work).

> How much of a performance boost is the use-null-terminated-strings?

As CPU is cheap nowadays, not much in end-to-end performance, but the 
logic of figuring out whether a header file is correct/complete 
becomes much easier when you construct the entire .header-file in an 
iovec, place the total header length in the on-disk structure, and 
then write it out.

Reading it in becomes reading main data structure, and then reading 
whatever length the structure indicates as headers. If you get more or 
less than the data structure says then something is wrong and you can 
either retry (if the header seems to be currently writing and the 
iovec size is too small so it takes multiple writes, but as the 
current mod_disk_cache code uses temporary files that's a non-issue) 
or discard it.

The current text-ish-based .header files offers no way of knowing the 
integrity of the header file, and store_table()/read_table() have 
quite a lot of complexity when just handling the null terminated 
strings as is would do nicely.

/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     nikke@acc.umu.se
---------------------------------------------------------------------------
  After three days of intense pain, the snake died. * Riker
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Mime
View raw message