httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Tatarinov <tatar...@prairie.NoDak.edu>
Subject Want to add file caching to Apache
Date Wed, 19 Nov 1997 21:13:01 GMT
Hi there,

I've already discussed my idea with Dean and he eventually agreed that 
it makes certain sence. But I would like to get more opinions on it. So
please let me know what you think.

I am suggesting and willing to implement a document (file) cache in Apache. (I
am a Ph.D. student and I am doing reserach in Web caching so
this could help my dissertation; if I'll ever write one :)

First, let me exaplain what I mean by a file cache. It is not reusing 
mmaped files. Instead, I am suggesting allocating a shared memory segment
(1-64M, may be even larger) that would store copies of frequently requested
files. It is relatively easy to get a high hit ratio (>80%) in a Web server
cache (see for example, http://www.cs.ndsu.nodak.edu/~tatarino/pubs/static.ps
you don't need to read it entirely, just look at the graphs)

The main benefits are that
+ in 80% (hit rate) of cases we don't need to open, mmap (or copy), and close
the file.
+ file system cache policy is not good (a large requested file may flush the
entire cache)

Let me emphasize that we don't have to cache _all_ requested files. A smart
cache policy would instead admit only smaller, more popular files.
Unfortunately, that may be hard to implement. But there is a simple
policy LRU+threshold that doesn't cache large files and performs pretty well.

The possible problems are:
- whenever we cache a document, an extra memcopy has to be done
this is true but it needs to be done only on a cache miss (20%, actually
less than that since not all misses require caching the requested file)
- additional synchronization overhead
that's true, but I think the benefits will outweigh this. The real problem is
that cheap inter-process synch mechanisms are only available in Solaris (once
Apache becomes multithreaded, synch will be simpler)

Finally, let me mention the killer idea (presented in the paper referenced
above). I know most of you will not like it but I still believe that it's nice
and useful. The paper talks about static caching, that is filling the cache
once a day (week,etc) and not replacing anything till the end of the day. As
odd as it may look, this policy often performs better than anything else. Its
main advanatge is that there is absolutely no extra overhead during the entire
day. Refilling a cache should take no more than a minute, so it shouldn't be a
problem.
One may argue that in this policy, new documents are not cached until the end
of the day. That's true but the number of sites that continuously create new
pages (ala CNN) is really small.

Well, seems like I've said too much already. 

Thanks for reading
Waiting for feedback
igor

Mime
View raw message