httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niklas Edmundsson <ni...@acc.umu.se>
Subject Re: jumbo patch from 39380 - and moving things 'up' to mod_cache itself
Date Fri, 09 May 2008 20:38:11 GMT
On Fri, 9 May 2008, Graham Leggett wrote:

>> There is a lot of valuable stuff in your jumbo patch - but I am not sure 
>> what the best approach is to fold it in.
>> 
>> Could you have have a look at the rough patch I posted earlier today - and 
>> let me know if you have any thoughts
>> as to which parts should be moved 'up' -- and hence be of use to other 
>> caching backends as well - and which
>> parts are pure disk optimized/specific ?
>> 
>> Or perhaps form an option if we need multiple disk cache modules - each 
>> optimized differently (e.g. for large files on multiple spindles; versus 
>> very 'hot' cache which is virtually living on a meory disk).
>
> I think the safest option is to develop it as another disk cache module in 
> parallel to the current one. It would definitely help if the jumbo patch 
> could be updated alongside the current batch of changes.

My jumbopatch was rather singlemindedly aimed at making the disk cache 
work with our usecase: caching large files on machines with 
not-that-much ram (ie. 4GB DVD images on machines with 3GB RAM). The 
current mod_disk_cache simply sucks at this, to the point it's 
completely unusable since all it does is start caching the file once 
per access, filling your disk and eating all your memory/address space 
and then segfault.

With that in mind, there are a few things that without a doubt should 
benefit "vanilla" mod_disk_cache, like the cleanup of on-disk-format 
(storing of headers, data types used for sizes), cleanup of various 
functions and probably other things. Those things could probably be 
merged to trunk without too much debate, given that I can find time to 
do it.

Also, we want the cache to be usable by other parties. We have a 
preloaded library that makes rsync able to use it, and an improved 
version that makes vsftpd work needs some testing before we put it to 
use.

Changing the absolutely silly defaults for length and number of 
subdirectories in the cache hierarchy would probably also be wise. You 
should never have to delete cache directories like is being done now, 
if you have to you're creating way too many (or are using a crappy 
filesystem without indexed dirs). CacheDirLength 1 and CacheDirLevels 
2 gives you 4096 directories (64^2), that should be more than enough 
for a competent filesystem.

If I remember correctly the parts of the patch causing commotion was:
- Getting rid of the temporary files and caching directly to the
   destination file (requires O_EXCL which supposedly doesn't work on
   Windows, "too much and ugly code" and other comments).
- The background-caching-stuff. Yes, I know that forking a new
   thread/process isn't pretty but that was the easy way to do it. A
   service-process that does the background-caching is probably more
   elegant but involved much more httpd magic that I'm clueless about.
- Using fstat() in the read-while-caching buckets, people wanted
   inotify-stuff for this even though there is no support in APR for
   it.

The thing is - this is what makes the thing work for using the cache 
with large files, and I'm not convinced that it fundamentally breaks 
stuff for small-file-cache-people.

Oh, and for a reference that the jumbo patch is at least usable: It 
has served approx 4220 PB during the time we have used it, with 
some 90-95% cache reuse :)

That said I'm very well aware that my code is not the ultimate 
solution, it's more of a proof that there's some odds and ends missing 
in apr/httpd/whatever infrastructure to do this in an elegant way. 
However, obviously still possible to do in a single module without 
having to patch your way through httpd.

I'm not convinced that forking the disk cache having two similar ones 
tuned for different usecases is a good idea in the long run, I'm 
pretty sure that the parts that needs tweaking can be solved with 
config options and documentation. For a development sprint like this 
it seems sane though, although one option could be to simply do a 
branch of modules/cache and go wild with the controversial parts until 
all parties are satisfied.


/Nikke - will have another glass of wine now, I'll fail miserably in
          Guitar Hero anyhow ;)
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     nikke@acc.umu.se
---------------------------------------------------------------------------
  When all else fails, read the manual.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Mime
View raw message